To: J3                                                     J3/22-174
From: Bill Long
Subject: OpenMP Liaison Report for M227
Date: 2022-July-19


The committee continues to work on changes for the OpenMP 6.0
specification. In my update last February, I said that OpenMP 6.0 is
scheduled to be released next year, November 2023. That date is now
tentative, as we may decide to delay the release to the following
year, November 2024, depending on how fast we can progress in adding
new major features.

There will be a 6.0 technical report draft released this November. The
following changes have already been voted in (I've tagged the ones
that are especially relevant for Fortran):

* [Fortran] Clarify handling of has_device_addr for Fortran array
  sections and data entities that are represented with a descriptor.

* [Fortran] Clause that accept "locator" list items may now accept
  references to functions that have data pointer results

* Adds extension to C++ attribute syntax to better support declarative
  directives.

* Adds environment variables for controlling ICV settings on both host
  and non-host devices.

* Adds the ability to specify a default device according to
  specification of device traits, and a new environment variable
  OMP_AVAILABLE_DEVICES to select which devices on the system are
  available to the program according to device traits.

* Adds a memscope clause to the flush and atomic constructs to limit
  coherence control to some specified partition.

* Adds a strict modifier to the num_threads clause, which forces an
  error (at compile time or runtime) if the implementation is not able
  to create a team with the exact number of threads requested.

* Disallow user-defined mapper for parameterized derived types
  (Fortran) or classes derived from a virtual base class (C++).

* Allow default map types and remove retrictions on position of map
  type modifier wrt other modifiers in a map clause.

The February update provided a list of 6.0 topics, and the ones that
haven't been completed are still actively being worked on. Here is a
summary of in-progress topics:

* [Fortran] Better support for parallelizing Fortran array syntax in
  target offload regions: Users at Livermore, in particular, have been
  requesting this for a while. They've had to write their own
  translators to get the desired parallelism. OpenMP may add something
  similar to the OpenACC kernels construct to better support this.

* [Fortran] Allow assumed-size arrays to be "mapped" on target
  constructs if the base storage location is mapped.

* [Fortran] Clarify handling of Fortran pointers with undefined
  association status. Disallow them appearing in map clauses.

* [Fortran] Further clarifications for how map should work for
  allocatable variables.

* [Fortran] Add interop runtime API support to Fortran

* [Fortran] Cover upcoming Fortran 2023 features in OpenMP 6.0

* Add syntax for expressing task dependences/affinity on a taskloop
  construct. Related: possibly add general mechanism for accessing the
  loop iterator values for the scheduled chunks of iterations
  resulting from a loop-associated directive (e.g., OMP DO, OMP
  TASKLOOP).

* Clarify forward progress guarantees for threads on a target device:
  Many OpenMP implementations will map thread teams to threads within
  a warp, which doesn't provide the same forward progress guarantees
  when threads diverge. This can cause problems when trying to use,
  for example, locks or atomic constructs. We will clarify under which
  conditions forward progress is guaranteed for target regions.

* Support per-team allocation of global/static variables: This is to
  provide an OpenMP declarative directive that says variables should
  be allocated in GPU "shared" memory, one per threadblock, similar to
  what the __shared__ attribute achieves in CUDA.

* Loop transformation directives: Add an apply clause to loop
  transformation directives to specify insertion of loop directives in
  resulting loop nest. Add more loop transformations, such as loop
  reversal, fusion, fission, interchange, and flattening (collapse).

* Allocator support for memory that is accessible from multiple
  devices: Presently, there isn't a way to allocate memory on one
  device that is guaranteed to be accessible on another device, unless
  general unified shared memory requirement is satisfied. This provide
  routines to explicitly create and use allocators for multi-device
  access.