To: J3                                                     J3/18-143
From: Gary Klimowicz
Subject: Add reductions to DO CONCURRENT
Date: 2018-February-15

Reference: 18-007: 11.1.7 (DO construct) and 16.9.161 (REDUCE)

Introduction
------------
To do a reduction on the array elements of a DO CONCURRENT,
one must construct a temporary array variable to hold the
values you want to reduce, and then perform the reduction
outside the DO CONCURRENT.

For hybrid computing environments, this can involve a substantial
amount of data movement. When we the user is most interested in
performing a simple reduction of the data in the DO CONCURRENT,
the data overhead can exceed the gains from the computational
parallelism.

We propose a way of specifying scalar reduction variables that are
managed across the DO CONCURRENT block executions to accumulate one
or more scalar values from each executed block.

We believe this will
* Increase expressiveness of DO CONCURRENT
* Increase opportunities for parallelism
* Reduce memory allocation needs
* Reduce need for memory copy when offloading to target processors

This is an in-Fortran specification of many of the features of
the OpenMP reduction clause.

Rough Idea
----------
The syntax proposed below is provided to add some concreteness to
the explanation of the proposed semantics.

In section 11.1.7.2, rule 1123, add 'concurrent-reduction' after
'concurrent-locality'.

'concurrent-reduction' would be zero or more 'reduction-spec's.
A 'reduction-spec' would specify a reduction variable, its
initial value, an operation to perform to pairs of values, and
the variable that would participate in the reduction.

Putting on fuzzy glasses, it could look something like this:
    REDUCE ( result-variable-name = initial-value, operation, &
            reduction-variable-name)

As each block execution of the DO CONCURRENT completes, values of
the reduction-variable are combined with the result-variable
using the specified operation.

Like the index-name variables, the result-variable-name cannot
appear in any LOCAL or LOCAL_INIT specification for the same DO
CONCURRENT statement. I think this must be SHARED locality, if
I understand it correctly.

Like the names in the locality-spec, the result-variable-name and
the reduction-variable-name shall be the names of variables in
the innermost executable construct or scoping unit that includes
the DO CONCURRENT statement. They cannot be the same variable.

The reduction-variable-name is allowed to appear in the LOCAL or
LOCAL_INIT specification for the DO CONCURRENT statement.

The operation (as in REDUCE) is a pure, associative function with
exactly two arguments whose type and type parameters match the
result-variable-name and reduction-variable-name. It would also
make sense for the operation to be a Fortran operator, like +,
*, .AND., .OR., etc.


Silly Example
-------------
For concreteness, here's an example with comments to indicate
the additional semantics.


subroutine f
integer, parameter :: N = 100
real :: A(N), B(N),
real :: SUMSQ, THE_THING

...
! SUMSQ = 0
do concurrent (I = 1:N) REDUCE(SUMSQ = 0, +, THE_THING)
    THE_THING = (A(N)+B(N)) ** 2
    ! SUMSQ = SUMSQ + THE_THING
end do
! SUMSQ has the sum of squares of all a[i]+b[i].

end f


Naive Semantics
---------------
The semantics is *as if* the construct were executed as follows,
with the order of execution of the assignments to the
result-variable-name undefined. (The ellipses are to indicate
that more than one reduction is allowed in a DO CONCURRENT,
so similar statements would also be executed for the second
and subsequent REDUCE clauses.)

    result-variable-name = initial-value ...
    DO CONCURRENT (concurrent-control-list, mask-expression)
        block ! where reduce-variable-name is set
        result-variable-name &
            = operation (result-variable-name, reduce-variable-name) ...
    END DO CONCURRENT