To: J3 J3/25-165r1 Subject: US04: Specifications for Asynchronous Collective Subroutines From: Brandon Cook & Damian Rouson & Dan Bonachea & Reuben D. Budiardja Date: 2025-October-16 References: J3/25-162r2, J3/23-174, J3/25-007r1, WG5/N2245, WG5/N2249 1. Background ------------- Use case paper 23-174 "Asynchronous Tasks in Fortran" describes "non-blocking collectives" that inspire the current requirements for what we are now referring to as "asynchronous collective subroutines". Paper WG5/N2245 provides a rationale for asynchronous collective subroutines. The current Fortran 202Y work list WG5/N2249 includes asynchronous collective subroutines as accepted work item US04. Paper J3/25-162r2 "US04: Requirements for Asynchronous Collective Subroutines" presents an illustrative use case and a list of requirements. The current paper provides specifications for the asynchronous collective subroutines feature. All identifier names should be considered notional and provisional, subject to resolution in a forthcoming syntax paper. 1.1 Illustrative Examples ------------------------- The examples below demonstrate the launching of asynchronous collective subroutines. EXAMPLE 1: subroutine simple_overlap use iso_fortran_env, only: completion_type type(completion_type) :: CC integer, asynchronous :: X, Y X = this_image(); Y = this_image()*7 call co_sum(X, completion=CC) ! initiate asynchronous collectives call co_max(Y, completion=CC) ! no references to X,Y permitted while collectives are outstanding call do_something call complete ( CC ) ! await completion of all collectives in CC print *, X, Y end subroutine This code demonstrates a new intrinsic type, which we notionally spell "completion_type" (sec 2), whose purpose is to track completion of explicitly asynchronous collectives. The above code shows an object of completion_type as a new optional argument to the existing collective subroutines. By passing that argument, the programmer expresses their desire for the collective to proceed asynchronously with respect to other work and additionally asserts that the variables involved in the collective will not be referenced or defined until after the collective is successfully synchronized. The synchronization happens in a subsequent intrinsic that we notionally name "complete" (sec 4). EXAMPLE 2: module async_collectives use iso_fortran_env, only: completion_type real :: A=1., B=2. type(completion_type) :: C(2) contains subroutine overlap_communication_computation integer :: s block asynchronous :: A, B, s call co_sum(A, completion=C(1)) ! initiate asynchronous collectives call co_min(B, completion=C(2), stat=s) ! no references to A,B,s permitted while collectives are outstanding call do_something call test_progress call complete ( C ) ! await completion of all collectives in C end block print *, A, B, s end subroutine subroutine test_progress logical :: done block asynchronous :: A call complete( C(1), query=done ) ! test for completion if (.not. done) return end block print *, A ! here it's safe to reference A end subroutine end module This example demonstrates some more complex use cases, involving arrays of completion_type and a non-waiting completion query. 2. Completion Type ------------------ T1. Add a new COMPLETION_TYPE to the ISO_FORTRAN_ENV intrinsic module. T2. COMPLETION_TYPE represents an abstraction for the completion status of one or more asynchronous collective operations. T3. COMPLETION_TYPE is a derived type with private components. It is an extensible type with no type parameters. Each nonallocatable component is fully default-initialized. T4. A scalar variable of type COMPLETION_TYPE is a completion variable. T5. The value of a completion variable includes its completion count, which is a count of outstanding asynchronous collective operations associated with this variable. T6. The initial value of the completion count of a completion variable is zero, representing that there are no outstanding asynchronous collective operations associated with this variable. T7. A completion variable shall not appear in a variable definition context except as an allocate-object, or as an actual argument in a reference to a procedure with an explicit interface if the corresponding dummy argument has INTENT (INOUT). T8. A variable with a nonpointer subobject of type COMPLETION_TYPE shall not appear in a variable definition context except as an allocate-object in an ALLOCATE statement without a SOURCE= specifier, as an allocate-object in a DEALLOCATE statement, or as an actual argument in a reference to a procedure with an explicit interface if the corresponding dummy argument has INTENT (INOUT). 3. Asynchronous Collective Initiation ------------------------------------- C1. Add a new keyword argument COMPLETION= to each of the five existing collective subroutines: CO_BROADCAST, CO_MAX, CO_MIN, CO_REDUCE, CO_SUM C2. COMPLETION is an optional INTENT(INOUT) argument. C3. If the optional COMPLETION argument is absent, the collective subroutine retains unchanged semantics as specified in Fortran 2023. C4. If the optional COMPLETION argument is present, it shall be a completion variable of type COMPLETION_TYPE from the intrinsic module ISO_FORTRAN_ENV. It shall not be coindexed. C5. If the optional COMPLETION argument is present, then the A argument, and the STAT and ERRMSG arguments if present, shall have the ASYNCHRONOUS attribute. C6. When the optional COMPLETION argument is present, the completion variable shall have its count incremented by one before return, and this variable becomes "associated" with the asynchronous collective operation. This subroutine call is termed the "initiation" of the asynchronous collective operation. C7. Implementations are encouraged to return as quickly as possible from the initiation call, but are permitted to stall for processor-dependent reasons before returning from the initiation call. C8. After initiation, the collective operation proceeds asynchronously towards completion. After the collective operation finishes its computation with respect to the current image, the count of the completion variable is decremented by one by the implementation. This decrement occurs regardless of whether an error condition occurs. This decrement is permitted to occur before the initiation call returns. A completion variable decrement on one image does not imply that other images participating in the same collective operation have finished their computation. C9. An asynchronous collective operation is considered "outstanding" on a given image during the period between initiation and when the completion count of the associated completion variable next reaches zero. C10. While an asynchronous collective operation is outstanding, any variable provided as the A, STAT or ERRMSG argument to the initiation call is considered a pending communicator affector (25-007r1 18.10.4) for an asynchronous input communication. As such, it shall not be referenced, appear in a variable definition context, become undefined, or have its pointer association status changed. C11. A completion variable shall not become undefined while its completion count is non-zero. C12. It is permitted for multiple outstanding asynchronous collective operations to become associated with the same completion variable. Implementations shall support association of up to no less than 128 outstanding asynchronous collective operations with any given completion variable. C13. If the COMPLETION argument is present in a reference to a collective subroutine on one image, it shall be present on all the corresponding references. 4. Completion Subroutine ------------------------ Illustrative syntax: CALL COMPLETE( COMPLETION_VAR [, QUERY]) W1. The COMPLETION_VAR argument provided to the COMPLETE subroutine shall be a variable of type COMPLETION_TYPE from the intrinsic module ISO_FORTRAN_ENV. It shall be INTENT(INOUT) and shall not be coindexed. W2. The QUERY argument is an optional argument of type logical that, if present, shall have the same shape as COMPLETION_VAR. W3. When QUERY is absent and COMPLETION_VAR is a scalar, execution of the COMPLETE subroutine causes the executing image to wait until after the count of the completion variable reaches zero. W4. When QUERY is absent and COMPLETION_VAR is an array, execution of the COMPLETE subroutine causes the executing image to wait until after the count of every completion variable in the array reaches zero. W5. When QUERY is present and COMPLETION_VAR is a scalar, then executing the COMPLETE subroutine causes QUERY to be defined to the value true if the count of the completion variable is zero, and false otherwise. W6. When QUERY is present and COMPLETION_VAR is an array, then executing the COMPLETE subroutine causes each element of QUERY to be defined to the value true if the count of the completion variable in the corresponding element of COMPLETION_VAR is zero, and false otherwise. W7. The COMPLETE subroutine has no bearing on segment ordering. W8. The COMPLETE subroutine is not a collective subroutine. 5. Error Behavior ----------------- E1. If an error condition occurs during the execution of a collective operation and no STAT variable was present during the initiation call, the implementation initiates error termination. This error termination need not occur during a call to the COMPLETE subroutine. E2. If a STAT variable was present during the initiation of an asynchronous collective and no error condition occurs during the computation, the STAT variable is assigned a zero value as if by intrinsic assignment. This assignment occurs between initiation and the decrement of the associated completion variable. E3. If an error condition occurs during the execution of an asynchronous collective operation and a STAT variable was present during the initiation call, the STAT variable is assigned a non-zero value as if by intrinsic assignment. This assignment occurs between initiation and the decrement of the associated completion variable. E4. If an error condition occurs during the execution of an asynchronous collective operation and an ERRMSG variable was present during the initiation call, the ERRMSG variable is assigned an explanatory error message as if by intrinsic assignment. This assignment occurs between initiation and the decrement of the associated completion variable. ===END===