To: J3 J3/21-115r3 From: Richard Bleikamp & JOR Subject: SPLIT intrinsic rework Date: 2021-March-02 Reference: 21-007, 21-109 This r2 paper adds edits to make TOKENIZE() and SPLIT() PURE and SIMPLE, and update the relevant tables, etc. Also, the straw vote has been removed, and both batches of edits have been combined into one set of edits. - - - - - Milan Curcic proposed separating the 3rd form of the new SPLIT intrinisic from the other forms, and either just keeping the 2 forms of SPLIT that tokenize the whole string all at once, or making the 3rd form of SPLIT, the serialized form, a new intrinsic. JOR also notes that the names "TOKENIZE" and "SPLIT" can be changed to something more appropriate, but JOR has no objections to these names. Note that a straw vote at meeting #223 (19-0-3-2) was in favor of renaming the first two forms of the old SPLIT as TOKENIZE, and keeping the 3rd (serialized) form as SPLIT. The edits below do that. Note that Bill's suggested edits for the original SPLIT, to prohibit coarrays and coindexed objects as arguments, is included (cut and pasted verbatim from 21-109, except substituting TOKENIZE for SPLIT). Bill's changes only affected the first two forms of the old SPLIT intrinsic, not the 3rd serialized form. Edits to 21-007 [xiii, 2] under Intrinsic procedures and modules: Change "SPLIT" to "TOKENIZE" in the list of new intrinsics. [xiii, 2] under Intrinsic procedures and modules: Add this sentence just before the description of the new TOKENIZE. "The intrinsic subroutine SPLIT parses a string into tokens, one at time." [338:17] Immediately before "and the elemental subroutine MVBITS," Insert "the subroutine SPLIT, the subroutine TOKENIZE, " [343:18+] add a new entry "SS indicates that the procedure is a simple subroutine." [348, before line 1] In table 16.1, "Standard generic intrinsic procedure summary" Delete the 3rd form of SPLIT, and the word "or" at the end of the preceding line AND Rename SPLIT as TOKENIZE, and place the entry for TOKENIZE after the entry for TINY. Make its "class" "SS". Renumber as appropriate (16.9.xxx) [348, 1-] in table 16.1, "Standard generic intrinsic procedure summary" Add an entry for SPLIT after the entry for SPACING "SPLIT (STRING, SET, POS [, BACK])" with class "SS" and a description of "Parse a string into tokens, one at a time." [432:23] Delete the 3rd form of SPLIT (renamed TOKENSIZE above), and delete the word "or" from the preceding line (line number may vary after reordering in the edit above for [348, before line 1]). [432:25] Change "Class. Subroutine." to "Class. Simple subroutine." [432:33-34] In 16.9.194 SPLIT (renamed TOKENIZE above), in the description of the TOKENS argument change the second sentence: "It is an INTENT(OUT) argument." to "It is an INTENT(OUT) argument. The corresponding actual argument shall not be a coarray or a coindexed object." [433:4-5] In 16.9.194 SPLIT (renamed TOKENIZE above), in the description of the SEPARATOR argument change the second sentence: "It is an INTENT(OUT) argument." to "It is an INTENT(OUT) argument. The corresponding actual argument shall not be a coarray or a coindexed object." [433:9] In 16.9.194 SPLIT (renamed TOKENIZE above), in the description of the FIRST argument change the second sentence: "It is an INTENT(OUT) argument." to "It is an INTENT(OUT) argument. The corresponding actual argument shall not be a coarray or a coindexed object." [433:15] In 16.9.194 SPLIT (renamed TOKENIZE above), in the description of the LAST argument change the second sentence: "It is an INTENT(OUT) argument." to "It is an INTENT(OUT) argument. The corresponding actual argument shall not be a coarray or a coindexed object." [433:20-36] Delete all the descriptions for POS and BACK. They are added back to the new SPLIT by the edits to [432:22+] below. [434:3-19] Delete the example thats uses a DO loop. They are added back to the new SPLIT by the edits to [432:22+] below. [432:22+] After the entry for SPACING, add a new entry (note, all the added text below is identical to what was in 21-007, for SPLIT, but just those parts that apply to the 3rd/serialized form, EXCEPT that the Description has been reworded) "16.9.xxx SPLIT (STRING, SET, POS [, BACK]) Description. Parse a string into tokens, one at a time. Class. Simple Subroutine. Arguments. STRING shall be a scalar of type character. It is an INTENT (IN) argument. SET shall be a scalar of type character with the same kind type parameter as STRING. It is an INTENT (IN) argument. Each character in SET is a token delimiter. A sequence of zero or more characters in STRING delimited by any token delimiter, or the beginning or end of STRING, comprise a token. Thus, two consecutive token delimiters in STRING, or a token delimiter in the first or last character of STRING, indicate a token with zero length. POS shall be an integer scalar. It is an INTENT (INOUT) argument. The value of POS shall be in the range 0 < POS <= LEN (STRING) + 1. If BACK is absent or is present with the value false, POS is assigned the position of the leftmost token delimiter in STRING whose position is greater than POS, or if there is no such character, it is assigned a value one greater than the length of STRING. If BACK is present with the value true, POS is assigned the position of the rightmost token delimiter in STRING whose position is less than POS, or if there is no such character, it is assigned the value zero. If SPLIT is invoked with a value for POS in the range 1 <= POS <= LEN (STRING), and the value of STRING (POS:POS) is not equal to any character in SET, the token identified by SPLIT will not comprise a complete token as described in the description of the SET argument, but rather a partial token. BACK (optional) shall be a logical scalar. It is an INTENT (IN) argument. Example. Execution of CHARACTER (LEN=:), ALLOCATABLE :: INPUT CHARACTER (LEN=2) :: SET = ', ' INTEGER P INPUT = "one,last example" P = 0 DO IF (P > LEN (INPUT)) EXIT ISTART = P + 1 CALL SPLIT (INPUT, SET, P) IEND = P - 1 PRINT '(T7,A)', INPUT (ISTART:IEND) END DO will print one last example" ---------- end of edits