To: J3 J3/22-138 From: Bill Long Subject: Liaison reports for MPI and OpenMP Date: 2022-March-02 Meeting 226 Liaison Reports for MPI and OpenMP. MPI: ---- There hasn't been much development since the 4.0 standard was approved. A few proposals being discussed, but it's still very early for them. OpenMP: ------- The OpenMP group did have another virtual "F2F" meeting earlier this month where we started early discussions on topics for the 6.0 specification, which is scheduled to be released in November 2023. Here is a list of things that were discussed: * Clarify forward progress guarantees for threads on a target device: Many OpenMP implementations will map thread teams to threads within a warp, which doesn't provide the same forward progress guarantees when threads diverge. This can cause problems when trying to use, for example, locks or atomic constructs. We will clarify under which conditions forward progress is guaranteed for target regions. * Support scoped atomics: Today, OpenMP doesn't allow atomic operations from different devices to concurrently access the same variable. We will add a memscope clause to the atomic and flush constructs to explicitly allow cross-device synchronization. * Support per-team allocation of global/static variables: This is to provide an OpenMP declarative directive that says variables should be allocated in GPU "shared" memory, one per threadblock, similar to what the __shared__ attribute achieves in CUDA. * New declarative C++ attribute specifiers: As of OpenMP 5.1, C++ attributes may be used in place of pragmas to specify OpenMP directives. The plan for 6.0 is to leverage the flexibility of attribute specifiers in C++ to attach OpenMP declarative attributes. You could have a declaration like: "int x [[omp::threadprivate]];" * Better support for parallelizing Fortran array syntax in target offload regions: Users at Livermore, in particular, have been requesting this for a while. They've had to write their own translators to get the desired parallelism. OpenMP may add something similar to the OpenACC kernels construct to better support this. * More loop transformation directives: Among the directives being proposed for 6.0 are loop reversal, fusion, fission, interchange, and flattening (collapse). * Allocator support for memory that is accessible from multiple devices: Presently, there isn't a way to allocate memory on one device that is guaranteed to be accessible on another device, unless general unified shared memory requirement is satisfied. This provide routines to explicitly create and use allocators for multi-device access. We will have a few more meetings this year which may be in-person or hybrid and release a TR draft for 6.0 in November. ----