To: J3                                                     J3/26-171
Subject: Edits for UTI 002: collective subroutine sequencing
From: Dan Bonachea & HPC
Date: 2026-June-12
References: 25-127r1, 25-202r3, 26-007r1

1. Introduction
---------------

Papers 25-127r1 and 25-202r3, passed by J3 in meetings #236 and #237
respectively, provide edits for work item DIN1: Collectives over a specified
team.

In the course of applying these edits, the Editor inserted Unresolved Technical
Issue (UTI) 002, which reads as follows in 26-007r1:

 "Suggested wording change for p2 (now p3) did not work.

  That is, does not work unless every invocation in between team transitions
  specified the same team.

  When the teams are not the "current" team, there is a serious problem with
  delineations (the team transitions) as the images outside the current team
  will be almost always be executing completely independent code sequences, and
  completely different invocations of collectives.

  It does not work for two different, overlapping, invocations using child
  teams.

  One could envisage trying to make it work for child teams by something along
  the lines of "the sequence of invocations of collective subroutines shall be
  the same except that a collective subroutine that specifies a child team need
  only be executed on images that are also members of the child team". I don't
  think that works though, as there could be inexplicable processor-dependent
  deadlocks. We allow SYNC TEAM to specify child teams even though it is trivial
  for that to deadlock... but there, the deadlock is reasonably detectable
  without much overhead.

  We surely do not want the internal communications of collectives to
  potentially (and randomly) deadlock.

  I am reasonably confident that with sufficient thought, we can come up with
  something reasonable that allows the good cases, and does not allow the bad
  ones. The last thing we should want to do is to completely neutralise the
  "same sequence" requirement, as that would seriously impact program
  reliability.

The HPC subgroup agrees there is a problem here.

This paper provides background information to fully explain the subtleties of
the problem, and edits to resolve it.

2. Background
-------------

There are many correctness requirements on the invocation of collective
subroutines in Fortran. Many of these are guided by the same basic requirement,
which informally stated is that images must "agree" on the order of
collective subroutines invocations in which they are involved, both with
respect to other collective subroutine invocations and more generally with
image control statements that synchronize images. For demonstration purposes,
here are some trivial counter-examples of programs that need not be supported:

Example 1: (should be forbidden)
---------

  PROGRAM misordered_collective
    integer :: X = 1
    if (THIS_IMAGE() == 1) SYNC ALL
    call CO_SUM(X)
    if (THIS_IMAGE() /= 1) SYNC ALL
  END PROGRAM

When executed with more than one image, image 1 invokes SYNC ALL and then
CO_SUM, and the other images invoke in the opposite order. This program produces
silent deadlock on GFortran 16.1, NAG 7.2, and a fatal runtime error on HPE Cray
Fortran 20.0.

Example 2: (should be forbidden)
---------

  PROGRAM missequenced_collective
    integer :: X = 1
    if (THIS_IMAGE() == 1) then
      call do_collective(.true.)
      call do_collective(.false.)
    else
      call do_collective(.false.)
      call do_collective(.true.)
    end if

    contains
      subroutine do_collective(do_sum)
         logical, intent(in) :: do_sum
         if (do_sum) then
           call CO_SUM(X)
         else
           call CO_MAX(X)
         endif
      end subroutine
  END PROGRAM

When executed with more than one image, image 1 invokes CO_SUM(X) and then
CO_MAX(X), and the other images invoke in the opposite order. It's not even
clear to a human reader what meaningful result the programmer expected here.
This program computes garbage results on GFortran 16.1, and produces a fatal
runtime error on NAG 7.2 and HPE CCE 20.0.

To understand why such programs are problematic, consider that in the common
(non-error) case, an invocation of a Fortran 2023 computational collective
(CO_{SUM,MIN,MAX,REDUCE}) where the RESULT_IMAGE argument is absent implies
data dependencies that effectively act as a barrier synchronization across the
participating images. Specifically, in the absence of very aggressive
optimization, no image can complete the collective computation until after
every participating image has reached the corresponding reference and provided
its contribution. This synchronization property is not guaranteed by the
language, but is likely exhibited by most implementations (at least when
RESULT_IMAGE is absent), which is one reason why Example 1 might deadlock if it
were permitted.  When RESULT_IMAGE is present, there are similarly data
dependencies that generally incur (weaker) synchronization behaviors, as the
root image usually cannot produce a result until after every other participating
image has provided its contribution.

Even in references where such synchronization need not occur within the
implementation, there are usually other processor-dependent reasons why all
images participating in a collective subroutine reference must "agree" on the
sequence of invocation of collective computations. In Example 2 the images
don't agree on the semantics of the first collective subroutine invocation;
image 1 is attempting to compute a CO_SUM while the rest of the images are
attempting to compute a CO_MAX. It would be difficult to specify a meaningful
result for such a program.

Clearly some kind of ordering property must be maintained in order for programs
to run correctly and meaningfully without deadlock, or breaking in mysterious
ways within the implementation of collective subroutines. What's less clear is
the extent to which the standard should attempt to mandate such a correctness
property. In particular, such ordering properties are very much a dynamic
property of the parallel execution trace, and not something that can be
enforced statically in general.

In Fortran 2023, these problems are dealt with in subclause 16.6 ("Collective
subroutines"). The first paragraph reads:

 "Successful execution of a collective subroutine performs a calculation on all
  the images of the current team and assigns a computed value on one or all of
  them. If it is invoked by one image, it shall be invoked by the same statement
  on all active images of its current team in segments that are not ordered with
  respect to each other; corresponding references participate in the same
  collective computation."

This paragraph clearly forbids multi-image executions of Example 1 above.
Specifically, the SYNC ALL statement defines a segment boundary across images
and specifies that prior execution segments are ordered before the subsequent
ones. In Example 1 different images invoke CO_SUM from segments that are
ordered with respect to each other, and thus violate the "not ordered with
respect to each other" rule of paragraph 1. This is a useful rule, but it only
helps in the specific situation where the problematic calls appear in segments
that happen to be relatively ordered across images. It does nothing to forbid
Example 2, where there are no segment boundaries near the problematic collective
invocations, and they do not appear in segments ordered with respect to each
other.

This UTI revolves around what we will informally call the "same sequence"
requirement of collective subroutines, which is intended to cover cases such as
Example 2. Here is the wording of that requirement as it appears in Fortran
2023 (16.6 p 2):

 "Before execution of the first CHANGE TEAM statement on an image, in between
  executions of CHANGE TEAM and/or END TEAM statements, and after the last
  execution of an END TEAM statement, the sequence of invocations of collective
  subroutines shall be the same on all active images of a team."

This requirement is intended to ensure that all images involved "reach" the
same collective subroutine invocations in the same order, even when no relative
segment ordering is involved. This clearly forbids multi-image executions of
Example 2, because: (1) there are no CHANGE TEAM statements (thus the entire
execution precedes "the first CHANGE TEAM"), (2) there is only one team (the
initial team), and (3) the sequence of invocations of collective subroutines
differs across the active images of the only team (thus the "shall" requirement
is violated and the program execution is not standard-conforming).

At first glance this rule seems to mandate the necessary property, forbidding
program executions that must be forbidden, while still allowing program
executions that should be permitted. Unfortunately the situation is less
clear-cut in programs with non-trivial team involvement.

3. Problems with the "same sequence" requirement in Fortran 2023
----------------------------------------------------------------

The CHANGE TEAM construct provides image control statements that respectively
change the current team of invoking images to a child of the current team
(CHANGE TEAM), and restore the current team back to the parent team (END TEAM).
A useful intuition here is that images maintain a stack-like discipline of
pushing and popping that determines the current team, with the additional
restriction that images can only "push" a direct child team of the
then-current team.

Fortran 2018 relaxed (relative to TS 18508:2015) the requirement that CHANGE
TEAM must collectively be invoked by all members of the current (parent) team
(see subclause C.1.2 [675:23-24]). In Fortran 2018 and later, CHANGE TEAM need
only be invoked by images who are members of the "new" team, i.e., the child
team (see 11.1.5.2 p5-6). Image synchronization performed by the CHANGE TEAM and
END TEAM statements is similarly across the "new" (child) team (see 11.1.5.2
p8 and Note 2). However the collective "same sequence" requirement quoted above
was not updated to reflect this relaxation. Specifically, it implicitly assumes
that all images agree on the sequence of CHANGE TEAM and END TEAM statements; a
property that is common to many programs but one which is not required by the
standard.

The main problem revolves around the following phrase (Fortran 2023 16.6 p2):

 "... the sequence of invocations of collective subroutines shall be the same
  on all active images of a team."

Of what team? The initial team? The team that was current at the time of the
invocation? All teams of which the images happen to be a member? All of these
intentions are plausible interpretations, and none of them are sufficient to
unambiguously handle non-trivial cases such as the one shown in the following
example.

Example 3: (should be permitted)
---------

  PROGRAM tricky_change_team
    USE,INTRINSIC :: ISO_FORTRAN_ENV
    TYPE (TEAM_TYPE) :: T_odds_and_evens, T_everyone ! , T_initial
    INTEGER :: me, ni, x

    me = THIS_IMAGE()
    ni = NUM_IMAGES()
    print *, me, "/", ni, ": HELLO"

    ! Each image is a member of three teams:
    ! T_initial = GET_TEAM()
    FORM TEAM (1, T_everyone)
    FORM TEAM (mod(me,2)+1, T_odds_and_evens)

    SYNC ALL

    x = me
    call CO_MAX(x)  ! COLLECTIVE: T_initial (all images)
    IF (THIS_IMAGE() == 1) print *, "max of initial (first):", x

    IF (mod(me,2) == 1) THEN ! Roughly half the images execute the following:

      CHANGE TEAM(T_odds_and_evens) ! BOUNDARY: odd-numbered images only

        x = me
        call CO_MAX(x) ! COLLECTIVE: odd team only of T_odds_and_evens
        IF (THIS_IMAGE() == 1) print *, "max of odds:", x

      END TEAM ! BOUNDARY: odd-numbered images only

    END IF

    print *, me, "/", ni, ": HERE"
    x = me
    call CO_MAX(x)  ! COLLECTIVE: T_initial (all images)
    IF (THIS_IMAGE() == 1) print *, "max of initial (second):", x

    CHANGE TEAM(T_everyone) ! BOUNDARY: all images

      x = me
      call CO_MAX(x)  ! COLLECTIVE: T_everyone (all images)
      IF (THIS_IMAGE() == 1) print *, "max of everyone:", x

    END TEAM ! BOUNDARY: all images

    SYNC ALL
    print *, me, "/", ni, ": DONE"
  END PROGRAM

In this example, a subset of the images CHANGE TEAM to a child team, invoke a
collective subroutine over that team, and then END TEAM, after which all the
images on the initial team invoke a collective subroutine. Unlike the examples
given in Section 2, the HPC subgroup believes there is no fundamental reason
that executions of this program should be forbidden. This program should execute
(without deadlocks) as the programmer intended. Indeed testing shows it executes
correctly and produces the expected results on GFortran 16.1 and NAG 7.2 (it
produces a fatal runtime error on HPE CCE 20.0, which has been reported as
HPE Case #5400026783).

A very strict reading of the current (ambiguous) requirement in Fortran 2023
16.6 p2 might imply that any program that does not execute an identical sequence
of CHANGE TEAM and END TEAM statements on every image would be prohibited from
invoking any collective subroutines, at least in the intervals where the
sequences are not identical across every image. Such an interpretation is overly
restrictive and almost certainly not what we want to specify.

4. Collectives over a specified team
-------------------------------------------
Everything discussed to this point deals only with the existing Fortran 2023
feature set and problematic verbiage in the Collective Subroutines section of
the Fortran 2023 standard. Now we need to consider the additional complication
added by work item DIN1 (with edits passed in 25-127r1 and 25-202r3). This new
feature adds a TEAM argument to collectives, which enables execution of
collective subroutines on a specified team (in particular a team that differs
from the current team).

Observations:

1. With the introduction of collectives on a specified team, any collective
   subroutine reference can specify any of a number of teams (where the current
   team is just one option). As such, the value of the current team no longer
   uniquely determines the images involved in a collective subroutine reference.

2. Moreover, an image may be a member of many teams and execute collective
   subroutines over all of them without ever executing a CHANGE TEAM construct.
   Indeed, this was one of the primary motivations for the DIN1 feature
   (collectives over a specified team); namely, to decouple collective
   subroutine invocations from the current team and the CHANGE TEAM construct.

3. As such, the CHANGE TEAM construct has become mostly irrelevant to the
   collective sequencing requirements.  CHANGE TEAM and END TEAM continue to be
   image control statements that impose segment order (and thus impact the "in
   segments that are not ordered with respect to each other" requirement), but
   it no longer makes sense to structure collective sequencing requirements
   around the CHANGE TEAM construct.

Nevertheless, we still need some requirements to specify what execution
sequences of collective subroutine references are permitted, and to clarify
which execution sequences could possibly lead to deadlock. Edits are provided
below.

5. Edits (relative to 26-007r1)
--------

-------------------------------------------------------------------------
[425] (17.6 Collective subroutines)

Delete UTI 002.

-------------------------------------------------------------------------
[425:5-7] (17.6 Collective subroutines, paragraph 3)

Delete the first sentence that reads:

 "Before execution of the first CHANGE TEAM statement on an image, in between
  executions of CHANGE TEAM and/or END TEAM statements, and after the last
  execution of an END TEAM statement, the sequence of invocations of collective
  subroutines shall be the same on all active images of a team."

And replace that sentence with the following sentences:

 "All participating images must invoke collective subroutines in the same order
  per team. Specifically, once any image begins executing the invocation of a
  collective subroutine, all other active images in the specified team must
  eventually invoke the corresponding reference, and no other collective
  subroutine reference with the same specified team in between. An invocation of
  a collective subroutine is not guaranteed to complete execution on an image
  until after all participating images have reached the corresponding reference.
 "

Add new note to the same paragraph:

 "NOTE: The processor is not required to detect violations of the rule that
        collective subroutines are invoked in the same order per team, nor is
        it responsible for detecting or resolving deadlock problems (such as
        two images waiting on different collective subroutine references
        specifying different teams with overlapping membership).
        The implementation of collective subroutines is permitted but not
        required to include synchronization between participating images.
        A program that makes assumptions about the presence or absence of
        unspecified synchronization is unlikely to be portable."

-------------------------------------------------------------------------

6. Rationale
------------

The first two sentences of the edits above specify the required condition for
correct execution of collective computation, which is roughly that all images
on a given team must "agree" on the order of corresponding collective subroutine
references specifying that team. The wording is modeled on wording appearing in
section 6.12 of the MPI 5.0 standard [250:1-4], where the analogous problem
arises for MPI collective operations performed over a specified communicator
(the MPI analog to a Fortran team).

The final sentence and NOTE above explicitly clarify that a reference to any
collective subroutine *might* include (in the worst case) fully blocking
synchronization between all participating images. This in turn implies that
portable programs should always be written under the assumption of such
synchronization, and that any deadlocks arising from such synchronization
represent a programmer error. (Wording here was based on 26-007r1 5.3.4 NOTE 2
and 9.8.1.2 NOTE 1).

The proposed approach is analogous to the general approach taken with all other
image-synchronization operations in the language. Image deadlocks are not
forbidden by the standard; neither as a general principle, nor by imposing
rules of execution that would conservatively prevent deadlocks. Instead, the
semantics of image synchronization are written to explain the execution
dependencies they induce between images, which may lead to deadlock as a
natural consequence during the execution of programs that are not structured to
respect those inter-image dependencies.

===END===