To: J3 J3/22-188 From: Bill Long Subject: OpenMP Liaison Report Date: 2022-October-14 The OpenMP committee is finalizing Technical Report 11 (TR11) for release at SC this year. It is the first preview draft of OpenMP 6.0. It's unclear at this point if 6.0 will be released in November 2023 or 2024, but 2024 seems more likely at this point. Here are some of the major changes that have already been drafted by the OpenMP language committee and will be in TR11: * "strict" modifier for the num_threads clause, requires implementation to provide exactly the number of threads requested for a parallel region or error out. * extended syntax for setting the default device or the available devices based according to device type * support for system-scoped atomics, with the memscope clause * an apply clause was added to loop transformation directives (currently tile and unroll) for specifying additional loop transformations on the resulting loops. This gives the programmer control over the sequence of loop transformations to perform on a given loop nest. * add support for mapping assumed-size arrays * New C++ "decl" attribute is available as alternative to using certain declarative pragma directives. Example: int x [[omp::decl(threadprivate)]]; instead of: int x; #pragma omp threadprivate(x) Other features that will likely make it in to TR11 for which the writeup is still in progress: * loop transformation directives for interchange and reverse * operations an induction clause for specifying general induction * operations (generalization of linear). Example: !$omp do induction(*, step(2): k) do i=1,n k = k * 2 ... end do * Also a directive for user-defined inductions ("declare induction"). * A new groupprivate directive, mainly useful for specifying a private copy of a global/static variable should be created for each "thread group" (e.g. for NVIDIA devices, this could result in creating a copy in fast, shared memory for each thread block). * New APIs to enable allocation of memory that is accessible from multiple devices * Clarification on forward progress for threads created by parallel constructs, and a "safesync" clause for ensuring certain threads in the team can safely synchronize when they're diverged and still make progress. This is useful for implementations that map OpenMP threads to the same warp on a GPU (all OpenMP compilers that support target offload to GPUs to do this, to my knowledge) * Specify that the predefined reduction operations work on certain non-arithmetic or non-intrinsic types that meet certain conditions, without requiring user-defined reductions ("declare reduction"). There are plenty of other features that are being discussed for eventual inclusion in 6.0. Here's a listing of the major ones that have had recent discussion: * free agent tasks -- allows an OpenMP task to be picked up and executed by a thread outside the current thread team with a free_agent clause. * clarify mapping behavior for Fortran allocatables. The fact that Fortran seems to be silent on how allocatables are stored has made it challenging to describe the mapping semantics on target constructs. The current language is somewhat vague, and implementations do different things. * better support for parallelization of Fortran array syntax in target regions * various improvements to tool interfaces for monitoring target regions or target device operations * more loop transformation directives (e.g., fission, fusion) * new OpenMP types for representing sets of devices or threads, loop schedules * extensions for expressing how memory resulting from an OpenMP allocator should be partitioned --END--