To: J3 J3/25-199 From: Brandon Cook & Dan Bonachea Subject: Edits for US20 Collective Subroutines for Prefix Reductions Date: 2025-October-26 References: 25-144r1, 25-177r1, 25-166r2, 25-195r1, 25-127r1, 25-007r1, WG5/N-2239 1. Background ============= The Fortran 202Y work list (WG5/N-2239) includes work item US20: "Add Intrinsic and collective subroutines for prefix operations" Paper 25-144r1 "Requirements for US20: Collective Subroutines for Prefix Operations" presents illustrative use cases and requirements for collective subroutines for prefix reduction. That paper was passed at J3 meeting #236 in June 2025. Specifications and syntax for the collective subroutine variants of prefix reduction operations, 25-177r1, was passed in the October 2025 meeting #237. 2. Syntax Adjustments ===================== Since the passage of 25-177r1, subsequent papers 25-166r2 and 25-195r1 have suggested additional syntax adjustments in order to maintain uniformity with closely related features under concurrent development. Syntax changes in this paper, relative to 25-177r1 are as follows: 1. Additional forms have been introduced to accommodate the presence of the TEAM argument (work item DIN1, 25-127r1) and the COMPLETION argument (work item US04, 25-166r2). 2. The IDENTITY argument to CO_REDUCE_PREFIX_EXCLUSIVE has been renamed to INITIAL (as recommended by 25-195r1). Note that a combined edits paper for orthogonal work-items DIN1 and US04 is still forthcoming, which will provide the edits in section 16.6 that are cross-referenced by the edits in this paper. 3. Edits Relative to 25-007r1 ============================= ------------------------------------------------------------------------- [xv] Add to "Intrinsic procedures" the sentences: "The new intrinsic subroutines CO_SUM_PREFIX_INCLUSIVE, CO_SUM_PREFIX_EXCLUSIVE, CO_REDUCE_PREFIX_INCLUSIVE, and CO_REDUCE_PREFIX_EXCLUSIVE perform collective prefix reduction operations across images." ------------------------------------------------------------------------- [383] In 16.7 Standard generic intrinsic procedures, Table 16.1, after the entry for CO_REDUCE add two new entries (with four forms each): " CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL [, STAT, ERRMSG]) or \ C Generalized exclusive prefix reduction across images. CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, COMPLETION \ [, STAT, ERRMSG]) or CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, TEAM \ [, STAT, ERRMSG]) or CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, TEAM, COMPLETION \ [, STAT, ERRMSG]) " and: " CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION [, STAT, ERRMSG]) or \ C Generalized inclusive prefix reduction across images. CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, COMPLETION \ [, STAT, ERRMSG]) or CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, TEAM \ [, STAT, ERRMSG]) or CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, TEAM, COMPLETION \ [, STAT, ERRMSG]) " ------------------------------------------------------------------------- [383] In 16.7 Standard generic intrinsic procedures, Table 16.1, after the entry for CO_SUM add two new entries (with four forms each): " CO_SUM_PREFIX_EXCLUSIVE (A, [, STAT, ERRMSG]) or \ C Compute exclusive prefix sum across images. CO_SUM_PREFIX_EXCLUSIVE (A, COMPLETION [, STAT, ERRMSG]) or CO_SUM_PREFIX_EXCLUSIVE (A, TEAM [, STAT, ERRMSG]) or CO_SUM_PREFIX_EXCLUSIVE (A, TEAM, COMPLETION [, STAT, ERRMSG]) " and: " CO_SUM_PREFIX_INCLUSIVE (A, [, STAT, ERRMSG]) or \ C Compute inclusive prefix sum across images. CO_SUM_PREFIX_INCLUSIVE (A, COMPLETION [, STAT, ERRMSG]) or CO_SUM_PREFIX_INCLUSIVE (A, TEAM [, STAT, ERRMSG]) or CO_SUM_PREFIX_INCLUSIVE (A, TEAM, COMPLETION [, STAT, ERRMSG]) " ------------------------------------------------------------------------- [411:20+] In 16.9 Specifications of the standard intrinsic procedures, after the specification of CO_REDUCE, add: 16.9.?? \ CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL [, STAT, ERRMSG]) or CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, COMPLETION \ [, STAT, ERRMSG]) or CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, TEAM \ [, STAT, ERRMSG]) or CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, TEAM, COMPLETION \ [, STAT, ERRMSG]) <> Generalized exclusive prefix reduction across images. <> Collective subroutine. <> A shall not be polymorphic. It shall not be of a type with an ultimate component that is allocatable or a pointer. It shall have the same shape, type, and type parameter values, in corresponding references. It shall not be a coindexed object. It is an INTENT (INOUT) argument. If A is scalar, the computed value provided to any given image is the result of the exclusive prefix reduction operation described below. If A is an array, each element of the computed value provided to any given image is equal to the result of the exclusive prefix reduction operation described below, as applied to corresponding elements of A in corresponding references. The computed value is assigned to A if no error condition occurs. Otherwise, A becomes undefined. INITIAL shall be a scalar with the same declared type and type parameter values as A. It is an INTENT (IN) argument. INITIAL shall have the same value in corresponding references. OPERATION shall be a pure function with exactly two arguments; the result and each argument shall be a scalar, nonallocatable, noncoarray, nonpointer, nonpolymorphic data object with the same type and type parameter values as A. The arguments shall not be optional. If one argument has the ASYNCHRONOUS, TARGET, or VALUE attribute, the other shall have that attribute. OPERATION shall implement a mathematically associative operation. OPERATION shall be the same function on all images in corresponding references. The computed value for an exclusive prefix reduction over a list of values is the result of an iterative process. Each scalar input value provided by image i in the specified team is referred to as A_i. The corresponding computed result value provided to image i in the specified team is referred to as R_i. S_i is initially the ordered list [INITIAL, A_1, ..., A_{i-1}]. Each iteration starts with a processor-dependent choice of item x from the list S_i. Adjacent items x and y (where x precedes y) are removed from the list and replaced with the value of OPERATION(x, y). The process terminates when the list has only one item; this is the computed value of R_i. TEAM shall be a scalar of type TEAM_TYPE from the intrinsic module ISO_FORTRAN_ENV. It is an INTENT (IN) argument. COMPLETION shall be a scalar of type COMPLETION_TYPE from the intrinsic module ISO_FORTRAN_ENV. It is an INTENT (INOUT) argument. STAT (optional) shall be a noncoindexed integer scalar with a decimal exponent range of at least four. It is an INTENT (OUT) argument. ERRMSG (optional) shall be a noncoindexed default character scalar. It is an INTENT (INOUT) argument. The semantics of TEAM, COMPLETION, STAT and ERRMSG are described in 16.6. <> The subroutine below demonstrates how to use CO_REDUCE_PREFIX_EXCLUSIVE to perform a collective exclusive prefix reduction analogous to the intrinsic function MAXLOC: SUBROUTINE co_prefix_maxloc(value, image) USE, INTRINSIC :: IEEE_ARITHMETIC, ONLY: IEEE_VALUE, IEEE_NEGATIVE_INF REAL, INTENT(INOUT) :: value INTEGER, INTENT(OUT) :: image TYPE :: tuple REAL :: value INTEGER :: image END TYPE TYPE(tuple) :: t t = tuple(value, THIS_IMAGE()) CALL CO_REDUCE_PREFIX_EXCLUSIVE(t, OPERATION=find_maxloc, & INITIAL=tuple(IEEE_VALUE(1.0,IEEE_NEGATIVE_INF), 0) ) value = t%value ! The largest value provided by a prior image, image = t%image ! .. and the index of that image, CONTAINS PURE FUNCTION find_maxloc(lhs,rhs) RESULT(maxloc) TYPE(tuple), INTENT(IN) :: lhs,rhs TYPE(tuple) :: maxloc maxloc = MERGE(lhs, rhs, lhs%value >= rhs%value) END FUNCTION find_maxloc END SUBROUTINE co_prefix_maxloc 16.9.?? \ CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION [, STAT, ERRMSG]) or CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, COMPLETION \ [, STAT, ERRMSG]) or CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, TEAM \ [, STAT, ERRMSG]) or CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, TEAM, COMPLETION \ [, STAT, ERRMSG]) <> Generalized inclusive prefix reduction across images. <> Collective subroutine. <> A shall not be polymorphic. It shall not be of a type with an ultimate component that is allocatable or a pointer. It shall have the same shape, type, and type parameter values, in corresponding references. It shall not be a coindexed object. It is an INTENT (INOUT) argument. If A is scalar, the computed value provided to any given image is the result of the inclusive prefix reduction operation described below. If A is an array, each element of the computed value provided to any given image is equal to the result of the inclusive prefix reduction operation described below, as applied to corresponding elements of A in corresponding references. The computed value is assigned to A if no error condition occurs. Otherwise, A becomes undefined. OPERATION shall be a pure function with exactly two arguments; the result and each argument shall be a scalar, nonallocatable, noncoarray, nonpointer, nonpolymorphic data object with the same type and type parameter values as A. The arguments shall not be optional. If one argument has the ASYNCHRONOUS, TARGET, or VALUE attribute, the other shall have that attribute. OPERATION shall implement a mathematically associative operation. OPERATION shall be the same function on all images in corresponding references. The computed value for an inclusive prefix reduction over a list of values is the result of an iterative process. Each scalar input value provided by image i in the specified team is referred to as A_i. The corresponding computed result value provided to image i in the specified team is referred to as R_i. S_i is initially the ordered list [A_1, ..., A_i]. Each iteration starts with a processor-dependent choice of item x from the list S_i. Adjacent items x and y (where x precedes y) are removed from the list and replaced with the value of OPERATION(x, y). The process terminates when the list has only one item; this is the computed value of R_i. TEAM shall be a scalar of type TEAM_TYPE from the intrinsic module ISO_FORTRAN_ENV. It is an INTENT (IN) argument. COMPLETION shall be a scalar of type COMPLETION_TYPE from the intrinsic module ISO_FORTRAN_ENV. It is an INTENT (INOUT) argument. STAT (optional) shall be a noncoindexed integer scalar with a decimal exponent range of at least four. It is an INTENT (OUT) argument. ERRMSG (optional) shall be a noncoindexed default character scalar. It is an INTENT (INOUT) argument. The semantics of TEAM, COMPLETION, STAT and ERRMSG are described in 16.6. <> The subroutine below demonstrates how to use CO_REDUCE_PREFIX_INCLUSIVE to compute a collective segmented prefix sum. A segmented prefix sum takes, as input, an ordered list of values and corresponding list of logicals, and the logicals delineate the various segments of the prefix sum. For example: values: 1 2 4 5 6 7 8 9 logicals: F F T T T F F T result: 1 3 4 9 15 7 15 9 Note the segmented_sum operation used below is noncommutative. SUBROUTINE co_prefix_segment_sum(value, flag) REAL, INTENT(INOUT) :: value LOGICAL, INTENT(IN) :: flag TYPE :: tuple REAL :: value LOGICAL :: flag END TYPE TYPE(tuple) :: t t = tuple(value, flag) CALL CO_REDUCE_PREFIX_INCLUSIVE(t, OPERATION=segmented_sum) value = t%value CONTAINS PURE FUNCTION segmented_sum(lhs,rhs) RESULT(sum) TYPE(tuple), INTENT(IN) :: lhs,rhs TYPE(tuple) :: sum IF (lhs%flag .eqv. rhs%flag) THEN sum%value = lhs%value + rhs%value ELSE sum%value = rhs%value END IF sum%flag = rhs%flag END FUNCTION segmented_sum END SUBROUTINE co_prefix_segment_sum ------------------------------------------------------------------------- [412:4+] In 16.9 Specifications of the standard intrinsic procedures, after the specification of CO_SUM, add: 16.9.?? CO_SUM_PREFIX_EXCLUSIVE (A [, STAT, ERRMSG]) or CO_SUM_PREFIX_EXCLUSIVE (A, COMPLETION [, STAT, ERRMSG]) or CO_SUM_PREFIX_EXCLUSIVE (A, TEAM [, STAT, ERRMSG]) or CO_SUM_PREFIX_EXCLUSIVE (A, TEAM, COMPLETION [, STAT, ERRMSG]) <> Compute exclusive prefix sum across images. <> Collective subroutine. <> A shall be of numeric type. It shall have the same shape, type, and type parameter values, in corresponding references. It shall not be a coindexed object. It is an INTENT (INOUT) argument. The computed value provided to image one in the specified team is equal to the value zero. If A is scalar, the computed value provided to any given image i in the specified team (with i greater than one) is equal to a processor-dependent approximation to the sum of the values of A in corresponding references provided by images 1 to (i-1) in the specified team. If A is an array, each element of the computed value provided to any given image i in the specified team (with i greater than one) is equal to a processor-dependent approximation to the sum of the values in corresponding elements of A in corresponding references provided by images 1 to (i-1) in the specified team. The computed value is assigned to A if no error condition occurs. Otherwise, A becomes undefined. TEAM shall be a scalar of type TEAM_TYPE from the intrinsic module ISO_FORTRAN_ENV. It is an INTENT (IN) argument. COMPLETION shall be a scalar of type COMPLETION_TYPE from the intrinsic module ISO_FORTRAN_ENV. It is an INTENT (INOUT) argument. STAT (optional) shall be a noncoindexed integer scalar with a decimal exponent range of at least four. It is an INTENT (OUT) argument. ERRMSG (optional) shall be a noncoindexed default character scalar. It is an INTENT (INOUT) argument. The semantics of TEAM, COMPLETION, STAT and ERRMSG are described in 16.6. <> If the number of images in the current team is three and the value of A is [1, 2] on image one, [3, 4] on image two, and [5, 6] on image three, after executing the statement CALL CO_SUM_PREFIX_EXCLUSIVE(A), the value of A is [0, 0] on image one, [1, 2] on image two, and [4, 6] on image three. 16.9.?? CO_SUM_PREFIX_INCLUSIVE (A [, STAT, ERRMSG]) or CO_SUM_PREFIX_INCLUSIVE (A, COMPLETION [, STAT, ERRMSG]) or CO_SUM_PREFIX_INCLUSIVE (A, TEAM [, STAT, ERRMSG]) or CO_SUM_PREFIX_INCLUSIVE (A, TEAM, COMPLETION [, STAT, ERRMSG]) <> Compute inclusive prefix sum across images. <> Collective subroutine. <> A shall be of numeric type. It shall have the same shape, type, and type parameter values, in corresponding references. It shall not be a coindexed object. It is an INTENT (INOUT) argument. If A is scalar, the computed value provided to any given image i in the specified team is equal to a processor-dependent approximation to the sum of the values of A in corresponding references provided by images 1 to i in the specified team. If A is an array, each element of the computed value provided to any given image i in the specified team is equal to a processor-dependent approximation to the sum of the values in corresponding elements of A in corresponding references provided by images 1 to i in the specified team. The computed value is assigned to A if no error condition occurs. Otherwise, A becomes undefined. TEAM shall be a scalar of type TEAM_TYPE from the intrinsic module ISO_FORTRAN_ENV. It is an INTENT (IN) argument. COMPLETION shall be a scalar of type COMPLETION_TYPE from the intrinsic module ISO_FORTRAN_ENV. It is an INTENT (INOUT) argument. STAT (optional) shall be a noncoindexed integer scalar with a decimal exponent range of at least four. It is an INTENT (OUT) argument. ERRMSG (optional) shall be a noncoindexed default character scalar. It is an INTENT (INOUT) argument. The semantics of TEAM, COMPLETION, STAT and ERRMSG are described in 16.6. <> If the number of images in the current team is three and the value of A is [1, 2] on image one, [3, 4] on image two, and [5, 6] on image three, after executing the statement CALL CO_SUM_PREFIX_INCLUSIVE(A), the value of A is [1, 2] on image one, [4, 6] on image two, and [9, 12] on image three. ------------------------------------------------------------------------- [596:19-20] In Annex A.2 Processor dependencies, replace the following line: "* the computed value of the intrinsic subroutine CO_REDUCE (16.9.57) and the intrinsic subroutine CO_SUM (16.9.58);" with the following line: "* the computed value of the intrinsic subroutines CO_REDUCE (16.9.57), CO_REDUCE_PREFIX_EXCLUSIVE (16.9.??), CO_REDUCE_PREFIX_INCLUSIVE (16.9.??), CO_SUM (16.9.58), CO_SUM_PREFIX_EXCLUSIVE (16.9.??) and CO_SUM_PREFIX_INCLUSIVE (16.9.??);" ------------------------------------------------------------------------- ===END===