To: J3                                                     J3/22-138
From: Bill Long
Subject: Liaison reports for MPI and OpenMP
Date: 2022-March-02

Meeting 226 Liaison Reports for MPI and OpenMP.


MPI:
----

 There hasn't been much development since the 4.0 standard was
 approved. A few proposals being discussed, but it's still very early
 for them.


OpenMP:
-------

The OpenMP group did have another virtual "F2F" meeting earlier this
month where we started early discussions on topics for the 6.0
specification, which is scheduled to be released in November
2023. Here is a list of things that were discussed:

* Clarify forward progress guarantees for threads on a target device:
  Many OpenMP implementations will map thread teams to threads within
  a warp, which doesn't provide the same forward progress guarantees
  when threads diverge. This can cause problems when trying to use,
  for example, locks or atomic constructs. We will clarify under which
  conditions forward progress is guaranteed for target regions.

* Support scoped atomics: Today, OpenMP doesn't allow atomic
  operations from different devices to concurrently access the same
  variable. We will add a memscope clause to the atomic and flush
  constructs to explicitly allow cross-device synchronization.

* Support per-team allocation of global/static variables: This is to
  provide an OpenMP declarative directive that says variables should
  be allocated in GPU "shared" memory, one per threadblock, similar to
  what the __shared__ attribute achieves in CUDA.

* New declarative C++ attribute specifiers: As of OpenMP 5.1, C++
  attributes may be used in place of pragmas to specify OpenMP
  directives. The plan for 6.0 is to leverage the flexibility of
  attribute specifiers in C++ to attach OpenMP declarative
  attributes. You could have a declaration like: "int x
  [[omp::threadprivate]];"

* Better support for parallelizing Fortran array syntax in target
  offload regions: Users at Livermore, in particular, have been
  requesting this for a while. They've had to write their own
  translators to get the desired parallelism. OpenMP may add something
  similar to the OpenACC kernels construct to better support this.

* More loop transformation directives: Among the directives being
  proposed for 6.0 are loop reversal, fusion, fission, interchange,
  and flattening (collapse).

* Allocator support for memory that is accessible from multiple
  devices: Presently, there isn't a way to allocate memory on one
  device that is guaranteed to be accessible on another device, unless
  general unified shared memory requirement is satisfied. This provide
  routines to explicitly create and use allocators for multi-device
  access.

We will have a few more meetings this year which may be in-person or
hybrid and release a TR draft for 6.0 in November.

----