To: J3 J3/21-115 From: Richard Bleikamp & JOR Subject: SPLIT intrinsic rework Date: 2021-February-22 Reference: 21-007, 21-109 Milan Curcic proposed separating the 3rd form of the new SPLIT intinisic from the other forms, and either just keeping the 2 forms of SPLIT that tokenize the whole string all at once, or making the 3rd form of SPLIT, the serialized form, a new intrinsic. After discussions with JOR and other interested parties, a partial concensus was achieved. The proposed edits allow for keeping the 3rd form of SPLIT as SPLIT, and creating a new intrinsic for the two "all at once" forms, named TOKENIZE, or something similar. We could delete the 3rd form of the intrinsic, since it is similar to SCAN, but subgroup did not reach a concensus on this point. STRAW VOTE: should the 3rd serialized form of SPLIT be retained? a: keep the 3rd serialized form as well as SCAN b: keep the 3rd serialized form as well and deprecate SCAN c: delete the 3rd serialized form of SPLIT d: undecided Possible straw vote: is TOKENIZE the appropriate name for the first two forms of the old SPLIT intrinsic? If not, what is? The edits below are against 21-007. They are segregated into two batches. The first batch renames SPLIT to TOKENIZE (or something similar if desired), and removes the 3rd serialized form. The second batch of edits adds the 3rd form of SPLIT back to the 007, named SPLIT (or something similar). Depending on the straw vote above, we can move/approve the first batch of edits only, effectively deleting the 3rd form. We also simplified limitations for the value of POS (3rd form), since return values from SPLIT aren't affected by the simplification. Note that Bill's suggested edits for the original SPLIT, to prohibit coarrays and coindexed objects as arguments, is included (cut and pasted verbatim from 21-109, except substituting TOKENIZE for SPLIT). Bill's changes only affected the first two forms of the old SPLIT intrinsic, not the 3rd serialized form. ---------- Edits to 21-007, BATCH 1: [xiii, 2] under Intrinsic procedures and modules: Change "SPLIT" to "TOKENIZE" in the list of new intrinsics. [348, before line 1] In table 16.1, "Standard generic intrinsic procedure summary" Delete the 3rd form of SPLIT, and the word "or" at the end of the preceding line AND Rename SPLIT as TOKENIZE, and place the entry for TOKENIZE after the entry for TINY. Renumber as appropriate (16.9.xxx) [432:23] Delete the 3rd form of SPLIT (renamed TOKENSIZE above), and delete the word "or" from the preceding line (line number may vary after reordering in immediately precedding edit). [432:33-34] In 16.9.194 SPLIT (renamed TOKENIZE above), in the description of the TOKENS argument change the second sentence: "It is an INTENT(OUT) argument." to "It is an INTENT(OUT) argument. The corresponding actual argument shall not be a coarray or a coindexed object." [433:4-5] In 16.9.194 SPLIT (renamed TOKENIZE above), in the description of the SEPARATOR argument change the second sentence: "It is an INTENT(OUT) argument." to "It is an INTENT(OUT) argument. The corresponding actual argument shall not be a coarray or a coindexed object." [433:9] In 16.9.194 SPLIT (renamed TOKENIZE above), in the description of the FIRST argument change the second sentence: "It is an INTENT(OUT) argument." to "It is an INTENT(OUT) argument. The corresponding actual argument shall not be a coarray or a coindexed object." [433:15] In 16.9.194 SPLIT (renamed TOKENIZE above), in the description of the LAST argument change the second sentence: "It is an INTENT(OUT) argument." to "It is an INTENT(OUT) argument. The corresponding actual argument shall not be a coarray or a coindexed object." [433:20-36] Delete all the descriptions for POS and BACK. [434:3-19] Delete the example thats uses a DO loop. ---------- end of BATCH 1 edits Edits to 21-007, BATCH 2: [xiii, 2] under Intrinsic procedures and modules: Add this sentence after the description of the new TOKENIZE. "The intrinsic subroutine SPLIT parses a string into tokens, one at time." [348, 1-] in table 16.1, "Standard generic intrinsic procedure summary" Add an entry for SPLIT after the entry for SPACING "SPLIT (STRING, SET, POS [, BACK])" [432:22+] After the entry for SPACING, add a new entry (note, all the added text below is identical to what was in 21-007, for SPLIT, but just those parts that apply to the 3rd/serialized form, EXCEPT that the Description has been reworded) "16.9.xxx SPLIT (STRING, SET, POS [, BACK]) Description. Parse a string into tokens, one at a time. Class. Subroutine. Arguments. STRING shall be a scalar of type character. It is an INTENT (IN) argument. SET shall be a scalar of type character with the same kind type parameter as STRING. It is an INTENT (IN) argument. Each character in SET is a token delimiter. A sequence of zero or more characters in STRING delimited by any token delimiter, or the beginning or end of STRING, comprise a token. Thus, two consecutive token delimiters in STRING, or a token delimiter in the first or last character of STRING, indicate a token with zero length. POS shall be an integer scalar. It is an INTENT (INOUT) argument. The value of POS shall be in the range 0 < POS <= LEN (STRING) + 1. If BACK is absent or is present with the value false, POS is assigned the position of the leftmost token delimiter in STRING whose position is greater than POS, or if there is no such character, it is assigned a value one greater than the length of STRING. This identifies a token with starting position one greater than the value of POS on invocation, and ending position one less than the value of POS on return. If BACK is present with the value true, POS is assigned the position of the rightmost token delimiter in STRING whose position is less than POS, or if there is no such character, it is assigned the value zero. This identifies a token with ending position one less than the value of POS on invocation, and starting position one greater than the value of POS on return. If SPLIT is invoked with a value for POS in the range 1 <= POS <= LEN (STRING), and the value of STRING (POS:POS) is not equal to any character in SET, the token identified by SPLIT will not comprise a complete token as described in the description of the SET argument, but rather a partial token. BACK (optional) shall be a logical scalar. It is an INTENT (IN) argument. Example. Execution of CHARACTER (LEN=:), ALLOCATABLE :: INPUT CHARACTER (LEN=2) :: SET = ', ' INTEGER P INPUT = "one,last example" P = 0 DO IF (P > LEN (INPUT)) EXIT ISTART = P + 1 CALL SPLIT (INPUT, SET, P) IEND = P - 1 PRINT '(T7,A)', INPUT (ISTART:IEND) END DO will print one last example" ---------- end of BATCH 2 edits