To: J3 J3/22-174 From: Bill Long Subject: OpenMP Liaison Report for M227 Date: 2022-July-19 The committee continues to work on changes for the OpenMP 6.0 specification. In my update last February, I said that OpenMP 6.0 is scheduled to be released next year, November 2023. That date is now tentative, as we may decide to delay the release to the following year, November 2024, depending on how fast we can progress in adding new major features. There will be a 6.0 technical report draft released this November. The following changes have already been voted in (I've tagged the ones that are especially relevant for Fortran): * [Fortran] Clarify handling of has_device_addr for Fortran array sections and data entities that are represented with a descriptor. * [Fortran] Clause that accept "locator" list items may now accept references to functions that have data pointer results * Adds extension to C++ attribute syntax to better support declarative directives. * Adds environment variables for controlling ICV settings on both host and non-host devices. * Adds the ability to specify a default device according to specification of device traits, and a new environment variable OMP_AVAILABLE_DEVICES to select which devices on the system are available to the program according to device traits. * Adds a memscope clause to the flush and atomic constructs to limit coherence control to some specified partition. * Adds a strict modifier to the num_threads clause, which forces an error (at compile time or runtime) if the implementation is not able to create a team with the exact number of threads requested. * Disallow user-defined mapper for parameterized derived types (Fortran) or classes derived from a virtual base class (C++). * Allow default map types and remove retrictions on position of map type modifier wrt other modifiers in a map clause. The February update provided a list of 6.0 topics, and the ones that haven't been completed are still actively being worked on. Here is a summary of in-progress topics: * [Fortran] Better support for parallelizing Fortran array syntax in target offload regions: Users at Livermore, in particular, have been requesting this for a while. They've had to write their own translators to get the desired parallelism. OpenMP may add something similar to the OpenACC kernels construct to better support this. * [Fortran] Allow assumed-size arrays to be "mapped" on target constructs if the base storage location is mapped. * [Fortran] Clarify handling of Fortran pointers with undefined association status. Disallow them appearing in map clauses. * [Fortran] Further clarifications for how map should work for allocatable variables. * [Fortran] Add interop runtime API support to Fortran * [Fortran] Cover upcoming Fortran 2023 features in OpenMP 6.0 * Add syntax for expressing task dependences/affinity on a taskloop construct. Related: possibly add general mechanism for accessing the loop iterator values for the scheduled chunks of iterations resulting from a loop-associated directive (e.g., OMP DO, OMP TASKLOOP). * Clarify forward progress guarantees for threads on a target device: Many OpenMP implementations will map thread teams to threads within a warp, which doesn't provide the same forward progress guarantees when threads diverge. This can cause problems when trying to use, for example, locks or atomic constructs. We will clarify under which conditions forward progress is guaranteed for target regions. * Support per-team allocation of global/static variables: This is to provide an OpenMP declarative directive that says variables should be allocated in GPU "shared" memory, one per threadblock, similar to what the __shared__ attribute achieves in CUDA. * Loop transformation directives: Add an apply clause to loop transformation directives to specify insertion of loop directives in resulting loop nest. Add more loop transformations, such as loop reversal, fusion, fission, interchange, and flattening (collapse). * Allocator support for memory that is accessible from multiple devices: Presently, there isn't a way to allocate memory on one device that is guaranteed to be accessible on another device, unless general unified shared memory requirement is satisfied. This provide routines to explicitly create and use allocators for multi-device access.