To: J3 J3/19-255 From: Gary Klimowicz Subject: Add reductions to DO CONCURRENT (US20) Date: 2019-October-16 Reference: 18-007r1, 19-198r2 Introduction ============ Performing a reduction on the array elements computed in DO CONCURRENT can be inefficient for large arrays. Naive expressions of a reduction are non-conforming or inefficient, requiring synchronization or explicit use of temporaries. At meeting 219, J3 approved continued work on this item, passing paper 19-198r2. Requirements ============ Provide Fortran support for some of the features of the OpenMP reduction clause on numeric and logical intrinsic types. The proposal should support: 1. Extend DO CONCURRENT to include the addition of scalar and array "reduction variables". 2. Allow multiple reduction variables to be defined for a single DO CONCURRENT. 3. Support a limited set of intrinsic binary and n-ary operators as reduction operators. 4. Provide the implicit initialization of the reduction variables consistent with the reduction operators. 5. Have reduction variable semantics aligned with existing locality specifications for DO CONCURRENT. 6. Have behavior analogous to a limited version of the OpenMP reduction clause. Specification ============= 1. Add a new locality-spec to describe reduction variables. Reduction variables are local to each iteration of the loop, initialized with a suitable value upon entry to the loop, but are related to a defined instance of the reduction variable in the scope outside the loop. 2. Provide a limited set of operators and functions that can be applied to reduction variables. This list is currently + * .AND. .OR. .EQV. .NEQV. MIN MAX IAND IOR IEOR 3. Provide for default initialization of reduction variables. A suitable initial value of an induction variable can be determined by its reduction operator. 4. Provide a limited number of ways that reduction variables can be defined in an iteration. We will limit the use of reduction variables to simplify implementation: var = var op expression var = expression op var var = function (var, expression) var = function (expression, var) for suitable values of op and function. 5. Upon normal completion of the DO CONCURRENT the value of each reduction variable will be as if the value of the reduction variable prior to the execution of the DO CONCURRENT was combined with each value of the reduction variable from each iteration of the loop via the reduce operator. This combination could be performed in any order. 6. The processor may make assumptions about how to parallelize the DO CONCURRENT iterations to optimize or eliminate synchronized access to the reduction variables. Discussion and Straw Votes ========================== These will affect the edits: 1. Does it make sense for a reduction variable in an iteration to have the POINTER, VOLATILE, ASYNCHRONOUS attributes? I don't think so, but these were included in LOCAL and LOCAL_INIT for a reason. Straw vote: 1a: Remove POINTER? yes; no; undecided 1b: Remove VOLATILE? yes; no; undecided 1c: Remove ASYNCHRONOUS? yes; no; undecided 2. OpenMP specifies that a POINTER reduction value gets a local copy of the pointed-to value in each iteration. Is that how this should be handled? Straw vote: If the result of 1a is not "remove", should we define that the pointer target is made local to the iteration? Yes; no; undecided 3. The OpenMP spec 5.0 for reductions (beginning p.293) uses terminology of implicitly defined variables omp_in, omp_out and omp_priv to describe the behavior. Is that appropriate for the Fortran standard, or should we use terminology more based on how we describe the shadowing definitions of LOCAL and LOCAL_INIT locality-specs? It seems like it would simplify the wording needed, but feels un-Fortran-like. Straw vote: Describe REDUCE in terms of private, in, out instances of variables "it behaves as if ..." for the reduction variables. 4. This version includes array reduction variables. DO CONCURRENT (and OpenMP) only specifies the use of variable names for reduction variables, not array sections or such. Straw vote: Is that sufficient for REDUCE? Yes; undecided" (no "no") Edits to 18-007r1 ================= { edits in page order } { add remark to Introduction in the bullet for Execution control: } [xiii] add: "The REDUCE locality specifier for DO CONCURRENT provides the ability to define reduction variables to be calculated on the loop." { Introduction briefly describes new features } { 11.1.7.2 Form of the DO construct } { Extend the definition of locality-spec to include reduction variables} [181:9] now reads: "R1130 locality-spec is LOCAL ( variable-name-list ) or LOCAL_INIT ( variable-name-list ) or SHARED ( variable-name-list ) or DEFAULT ( NONE )" after "DEFAULT ( NONE )" add "or REDUCE (reduce-op : variable-name-list)" so the resulting rule reads "R1130 locality-spec is LOCAL ( variable-name-list ) or LOCAL_INIT ( variable-name-list ) or SHARED ( variable-name-list ) or DEFAULT ( NONE ) or REDUCE (reduce-op : variable-name-list)" { Add to rule 1130 the new locality-spec } { Define a limited set of binary operators and procedures that can be used to combine values of the reduction variables} [181:9+] After rule 1130, add new rules: "R1130a reduce-op is binary-reduce-op or function-reduce-op R1130b binary-reduce-op is + or * or .AND. or .OR. or .EQV. or .NEQV. R1130c function-reduce-op is MIN or MAX or IAND or IOR or IEOR" { The list of permissible reduction operations is defined } { Add REDUCE to list of restricted locality-specs in C1128 } [181:22-26] Constraint C1128 now reads: "C1128 A variable-name that appears in a LOCAL or LOCAL_INIT locality-spec shall not have the ALLOCATABLE, INTENT (IN), or OPTIONAL attribute, shall not be of finalizable type, shall not be a nonpointer polymorphic dummy argument, and shall not be a coarray or an assumed-size array. A variable-name that is not permitted to appear in a variable definition context shall not appear in a LOCAL or LOCAL_INIT locality-spec." On line 22, replace "LOCAL or LOCAL_INIT locality-spec" with "LOCAL, LOCAL_INIT or REDUCE locality-spec" On line 25, replace "LOCAL or LOCAL_INIT locality-spec" with "LOCAL, LOCAL_INIT or REDUCE locality-spec" so that constraint C1128 now reads: "C1128 A variable-name that appears in a LOCAL, LOCAL_INIT or REDUCE locality-spec shall not have the ALLOCATABLE, INTENT (IN), or OPTIONAL attribute, shall not be of finalizable type, shall not be a nonpointer polymorphic dummy argument, and shall not be a coarray or an assumed-size array. A variable-name that is not permitted to appear in a variable definition context shall not appear in a LOCAL, LOCAL_INIT or REDUCE locality-spec." { REDUCE variables are not allowed to be ALLOCATABLE, INTENT(IN) or OPTIONAL } { REDUCE variables are of limited type } [181:26+] After constraint C1128, add a new constraint: "C1128+ A variable that appears in a REDUCE locality-spec shall be a scalar or array of type corresponding to its reduce-op as defined in Table 11.1." [181:26+] Somewhere near the new constraint, add this table: " Table 11.1. The types of the allowed operands for each reduction operator. OPERATOR ALLOWED TYPES -------- ------------- + integer, real, complex * integer, real, complex .AND. logical .OR. logical .EQV. logical .NEQV. logical MIN integer, real MAX integer, real IAND integer IOR integer IEOR integer" { REDUCE variables are limited to scalars or arrays of numeric or logical types } { 11.1.7.5 Additional semantics for DO CONCURRENT constructs } { Add REDUCE as a new kind of locality } [184:28-31 p1] Now reads: "The locality of a variable that appears in a DO CONCURRENT construct is LOCAL, LOCAL_INIT, SHARED, or unspecified. A construct or statement entity of a construct or statement within the DO CONCURRENT construct has SHARED locality if it has the SAVE attribute. If it does not have the SAVE attribute, it is a different entity in each iteration, similar to LOCAL locality." In line 28, replace "LOCAL, LOCAL_INIT, SHARED, or unspecified" with "LOCAL, LOCAL_INIT, REDUCE, SHARED, or unspecified" so that the first sentence of paragraph 1 reads: "The locality of a variable that appears in a DO CONCURRENT construct is LOCAL, LOCAL_INIT, REDUCE, SHARED, or unspecified. A construct or statement entity of a construct or statement within the DO CONCURRENT construct has SHARED locality if it has the SAVE attribute. If it does not have the SAVE attribute, it is a different entity in each iteration, similar to LOCAL locality." { REDUCE is added to the types of locality } { Describe REDUCE locality as a kind of LOCAL locality } [184:32-38 p2] now reads: "A variable that has LOCAL or LOCAL_INIT locality is a construct entity with the same type, type parameters, and rank as the variable with the same name in the innermost executable construct or scoping unit that includes the DO CONCURRENT construct, and the outside variable is inaccessible by that name within the construct. The construct entity has the ASYNCHRONOUS, CONTIGUOUS, POINTER, TARGET, or VOLATILE attribute if and only if the outside variable has that attribute; it does not have the BIND, INTENT, PROTECTED, SAVE, or VALUE attribute, even if the outside variable has that attribute. If it is not a pointer, it has the same bounds as the outside variable. At the beginning of execution of each iteration," In line 32, replace "A variable that has LOCAL or LOCAL_INIT locality" with "A variable that has LOCAL, LOCAL_INIT, or REDUCE locality" so that the first sentence of paragraph 2 now reads "A variable that has LOCAL, LOCAL_INIT, or REDUCE locality is a construct entity with the same type, type parameters, and rank as the variable with the same name in the innermost executable construct or scoping unit that includes the DO CONCURRENT construct, and the outside variable is inaccessible by that name within the construct." { REDUCE is like a kind of local locality-spec } { Add description of how REDUCE locality is initialized } [184:43+] Add a third bullet entry to paragraph 2 "- a variable with REDUCE locality has the pointer association status of the outside variable. It is defined with the value corresponding to its reduce-op as specified in Table 11.2; the outside variable shall not be an undefined pointer." [184:43+] Add Table 11.2 near the above edit "Table 11.2. The default initial values for each reduction operator. OPERATOR INITIAL VALUE -------- ------------- + 0 * 1 .AND. .TRUE. .OR. .FALSE. .EQV. .TRUE. .NEQV. .FALSE. MIN nnlmsbtp * MAX pnlmsbtp ** IAND NOT(0) IOR 0 IEOR 0 * nnlmsbtp = the value of the negative number of the largest magnitude supported by the processor for numbers of the type and kind of the reduction variable. ** pnlmsbtp = the value of the positive number of the largest magnitude supported by the processor for numbers of the type and kind of the reduction variable." { initial values are defined for variables based on the reduce-op in the reduce locality-spec } { Add REDUCE to the LOCAL and LOCAL_INIT I/O, TARGET descriptions } [185:1-4] Lines 1-4 after the bullet list currently reads: "If a variable with LOCAL or LOCAL_INIT locality becomes an affector of a pending input/output operation, the operation shall have completed before the end of the iteration. If a variable with LOCAL or LOCAL_INIT locality has the TARGET attribute, a pointer associated with it during an iteration becomes undefined when execution of that iteration completes. In line 1, replace "LOCAL or LOCAL_INIT locality" with "LOCAL, LOCAL_INIT, or REDUCE locality" In line 2, replace "LOCAL or LOCAL_INIT locality" with "LOCAL, LOCAL_INIT, or REDUCE locality" so the paragraph now reads: "If a variable with LOCAL, LOCAL_INIT, or REDUCE locality becomes an affector of a pending input/output operation, the operation shall have completed before the end of the iteration. If a variable with LOCAL, LOCAL_INIT, or REDUCE locality has the TARGET attribute, a pointer associated with it during an iteration becomes undefined when execution of that iteration completes." { Add semantics for loop termination and REDUCE variables. I'm struggling with the wording here. } [185:29+] After paragraph 5, which reads: "A DO CONCURRENT construct shall not contain an input/output statement that has an ADVANCE= specifier." Add two new paragraphs: "Variables listed in a REDUCE locality-spec can only appear in the following forms within the body of the DO CONCURRENT: var = var binary-reduce-op expression var = expression binary-reduce-op var var = function-reduce-op( [expression-list ,] var [, expression-list] ) For DO CONCURRENT statements with a REDUCE locality-spec, the value of each reduction variable is its value before the DO CONCURRENT combined via the reduce-op with the values of each iteration's reduction variable at the termination of the iteration. For array reduction variables, this application is done elementwise. The application of the reduction operators to the results of each iteration can be done in any order." { Reduction variables actually reduce } { end }