To: J3 J3/25-145r1 From: Brandon Cook & Dan Bonachea Subject: Requirements for US20: Local Prefix Operation Intrinsics Date: 2025-June-24 #Reference: J3/25-145, J3/24-157, J3/23-235r2, J3/23-113, WG5/N-2239 1. Introduction =============== Scan, or prefix reduction, operations are fundamental building blocks in parallel algorithms and data manipulation tasks. The SCAN and CO_SCAN proposal (J3/23-235r2) received "mixed support" at a previous meeting, and prospective work items, including prefix reduction operations, were "conditionally accepted" at the 2024 meeting, pending further discussion. They were again discussed at the February 2025 WG5 meeting, and subsequently promoted to "accepted" work item US20 via WG5 letter ballot in May 2025 (WG5/N-2239). The result of the WG5 vote was 20 yes 5 no and 1 undecided with several informative comments. This document focuses on requirements exclusively for the local prefix reduction variant, refining previous concepts based on community and WG5 feedback. By focusing exclusively on the local variant our aim is to allow consideration independent of the closely related but distinct collective subroutines. We do however revisit use cases briefly for clarity as we are now addressing the two variants separately. 2. Nomenclature ================ The term "SCAN" has been identified as problematic due to potential conflicts and historical connotations. In this document we adopt the terminology of "prefix reduction" to reference the operation. 3. Motivation and Local Prefix Reduction Use Cases ================================================== A local prefix reduction intrinsic offers significant benefits for Fortran programmers by providing a standardized, optimizable way to perform common computational patterns. There are many additional motivating use cases available in the literature (and described in J3/23-113). 3.1. Sparse Matrix Algorithms ------------------------------------------- Many high-performance computing applications rely heavily on sparse matrix operations. Prefix operations are important in these contexts, for example: * Computing Offsets: In constructing sparse matrix representations, a prefix reduction over the number of non-zero elements per row/column is used to compute the offsets into the final data structure. Document J3/24-157 highlights this use case from the MFDn application, noting that this step is non-trivial to implement portably and efficiently across different architectures (CPU and GPU). * Segmented prefix reductions for SpMV: Load-balancing sparse matrix-vector product (SpMV) kernels on GPUs often employ segmented prefix reduction operations. The Ginkgo library utilizes such algorithms to achieve high performance on these kernels. It is not clearly possible to implement these approaches in Fortran without such an intrinsic. 3.2. Implementation of Existing Intrinsics ------------------------------------------ Efficient implementations of some existing Fortran intrinsics can be expressed using prefix operations. J3/24-157 provides an example implementation of the `PACK` intrinsic using an inclusive prefix reduction to calculate the destination indices for the packed elements. 4. Discussion of Requirements ==================================== Based on feedback and existing practice, the design space for prefix intrinsics warrants careful consideration. 4.1. General vs. Specific Intrinsics ------------------------------------ Adding a general `REDUCE_PREFIX` for flexibility while also providing a highly-optimizable `SUM_PREFIX` for the most common use cases creates a consistent family of intrinsics, where `SUM` and `REDUCE` have `_PREFIX` variants. With straightforward transformations PRODUCT_PREFIX and COUNT_PREFIX can be implemented with SUM_PREFIX and MIN_PREFIX and MAX_PREFIX are possible with appropriate arguments to REDUCE_PREFIX(see Examples). 4.2. Handling Inclusive and Exclusive Prefix Reductions ------------------------------------------------------- Explicitly named functions distinguish between the inclusive and exclusive variants of prefix reductions. 5. SUM_PREFIX ============== 5.1 Existing SUM and new routines --------------------------------- For reference, the syntax for SUM is SUM(ARRAY, DIM [, MASK]) or SUM(ARRAY [, MASK]) The new intrinsics are SUM_PREFIX_INCLUSIVE(ARRAY [, MASK]) SUM_PREFIX_INCLUSIVE(ARRAY, DIM [, MASK]) SUM_PREFIX_EXCLUSIVE(ARRAY, [, MASK]) SUM_PREFIX_EXCLUSIVE(ARRAY, DIM [, MASK]) 5.2 Semantic Requirements ------------------------- These requirements follow a straightforward extension of SUM. ARRAY is an array of numeric type. The type and kind parameter of the result is determined by the type of ARRAY, as specified for the `+` operator. The rank and shape of the result is the same as the rank and shape of ARRAY. MASK is of type logical and conformable with ARRAY. The values of the result in the inclusive case with rank 1 ARRAY and no MASK should be equivalent to the naive serial implementation as follows RESULT(1) = ARRAY(1) DO i = 2, size(ARRAY) RESULT(i) = RESULT(i-1) + ARRAY(i) END DO The values of the result in the exclusive case with rank 1 ARRAY and no MASK should be equivalent to the naive serial implementation as follows RESULT(1) = 0 DO i = 2, size(ARRAY) RESULT(i) = RESULT(i-1) + ARRAY(i-1) END DO In the case of a multidimensional ARRAY with no DIM argument, RESULT should be computed analogously to the above, with the iteration proceeding in array element order. In the case of a specified DIM, the prefix reduction should be performed elementwise along that dimension. In the case of a specified MASK, elements of ARRAY contribute to the sum if and only if the corresponding element of MASK is true. 6. REDUCE_PREFIX ================ 6.1 Existing REDUCE and new routines ------------------------------------ For reference, the syntax of REDUCE is REDUCE(ARRAY, OPERATION [,MASK, IDENTITY, ORDERED]) or REDUCE(ARRAY, OPERATION, DIM [,MASK, IDENTITY, ORDERED]) The new intrinsics are REDUCE_PREFIX_INCLUSIVE(ARRAY, OPERATION [, MASK, IDENTITY, ORDERED]) REDUCE_PREFIX_INCLUSIVE(ARRAY, OPERATION, DIM [, MASK, IDENTITY, ORDERED]) REDUCE_PREFIX_EXCLUSIVE(ARRAY, OPERATION, IDENTITY [, MASK, ORDERED]) REDUCE_PREFIX_EXCLUSIVE(ARRAY, OPERATION, IDENTITY, DIM [, MASK, ORDERED]) Note that it is not possible to simultaneously preserve consistent argument order both between the prefix and non-prefix versions of REDUCE and between inclusive and exclusive definitions, with the constraint that `IDENTITY` be a non-optional argument for REDUCE_PREFIX_EXCLUSIVE. 6.2 Semantic Requirements ------------------------- These requirements follow a straightforward extension of REDUCE: OPERATION should be a pure function. OPERATION should accept exactly two arguments; the result and each argument must be a scalar, nonallocatable, noncoarray, nonpointer, nonpolymorphic, nonoptional data object with the same declared type and type parameters as the input ARRAY. OPERATION need not be a commutative operator. The computation will not reorder operands. If ORDERED is absent or .false., then OPERATION should be computationally associative. If ORDERED is present with the value .true., then the computation is guaranteed to effectively proceed in array element order, and operations will not be reassociated. The rank and shape of the result is the same as the rank and shape of ARRAY. MASK is of type logical and conformable with ARRAY. IDENTITY has the same type and type parameters as ARRAY and is a scalar. The values of the result in the inclusive case with rank 1 ARRAY, ORDERED=.true. and no MASK should be equivalent to the naive serial implementation as follows RESULT(1) = ARRAY(1) DO i = 2, size(ARRAY) RESULT(i) = OPERATION(RESULT(i-1), ARRAY(i)) END DO The values of the result in the exclusive case with rank 1 ARRAY, ORDERED=.true. and no MASK should be equivalent to the naive serial implementation as follows RESULT(1) = IDENTITY DO i = 2, size(ARRAY) RESULT(i) = OPERATION(RESULT(i-1), ARRAY(i-1)) END DO In the case of a multidimensional ARRAY with no DIM argument, RESULT should be computed analogously to the above, with the iteration proceeding in array element order. In the case of a specified DIM, the prefix reduction should be performed elementwise along that dimension. In the case of a specified MASK, elements of ARRAY contribute to the sum if and only if the corresponding element of MASK is true. 8. Examples =========== Assume function add(a,b) result(r) integer, intent(in) :: a,b; integer :: r; r=a+b end function add 8.1 Inclusive prefix reduction with + -------------------------------------- A = [1, 2, 3, 4] R = SUM_PREFIX_INCLUSIVE(A) ! R will be [1, 3, 6, 10] R = REDUCE_PREFIX_INCLUSIVE(A, OPERATION=add) ! R will be [1, 3, 6, 10] 8.2 Exclusive prefix reduction with + -------------------------------------- A = [1, 2, 3, 4] R = SUM_PREFIX_EXCLUSIVE(A) ! R will be [0, 1, 3, 6] R = REDUCE_PREFIX_EXCLUSIVE(A, OPERATION=add, IDENTITY=0) ! R will be [0, 1, 3, 6] 8.3 Exclusive Sum along dimension 2 with a mask: ------------------------------------------------ B = RESHAPE([1,2,3,4,5,6], [2,3]) ! B = | 1 3 5 | ! | 2 4 6 | M = RESHAPE([.T., .T., .F., .T., .T., .T.], [2,3]) R = SUM_PREFIX_EXCLUSIVE(B, DIM=2, MASK=M) ! R will be | 0 1 1 | ! | 0 2 6 | 8.4 COUNT_PREFIX via SUM_PREFIX ------------------------------- function COUNT_PREFIX_INCLUSIVE(MASK) LOGICAL, DIMENSION(:), intent(in) :: MASK INTEGER, DIMENSION(:) :: COUNT_PREFIX_INCLUSIVE COUNT_PREFIX_INCLUSIVE = & SUM_PREFIX_INCLUSIVE(MERGE(1, 0, MASK)) end function 8.5 PRODUCT_PREFIX via SUM_PREFIX ---------------------------------- Note: This implementation is more numerically stable than a direct PRODUCT_PREFIX due to the error accumulation properties of addition compared to multiplication. function PRODUCT_PREFIX_INCLUSIVE(ARRAY) INTEGER, DIMENSION(:), INTENT(IN) :: ARRAY INTEGER, DIMENSION(:) :: PRODUCT_PREFIX_INCLUSIVE PRODUCT_PREFIX_INCLUSIVE = & EXP(SUM_PREFIX_INCLUSIVE(LOG(ARRAY))) end function 9.0 Summary of straw vote results and design decisions ====================================================== 25-145 Requirements for US20: Local Prefix Operation Intrinsics Straw votes: Poll: A) General REDUCE_PREFIX, B) Highly specific, C) Hybrid with REDUCE and SUM versions only. People can vote more than once. Y/N/U A) 5/10/8 B) 9/7/7 C) 10/1/8 Poll: Inclusive/Exclusive versions A) Are EXCLUSIVE and IDENTITY optional arguments, or B) are there separate INCLUSIVE and EXCLUSIVE names that make IDENTITY a required argument? A/B/U 2/17/2 The paper has been revised to have separate INCLUSIVE and EXCLUSIVE routines. Poll: Generalize IDENTITY to allow rank-1 array? Scalar/Array/Undecided 15/0/6 The inclusion of this change would also potentially result in expanding the scope of edits to also include REDUCE for consistency. In the discussion it was concluded that this could be added in the future if a compelling use case arises. Poll: Segmented operation Should a segmented variant of the operation be included in the design? Y/N/U 0/15/7 Based on these votes the following revisions were made for 25-145r1: 1. consolidated on REDUCE and SUM (option C) 2. presented separate routines for inclusive and exclusive variants 3. removed references to a generalized IDENTITY 4. removed references to a segmented variant