To: J3 J3/23-122 From: Bill Long Subject: OpenMP Report Date: 2023-February-17 The OpenMP Architecture Review Board released Technical Report 11 in November 2022, the first preview draft of the upcoming OpenMP 6.0 specification (planned release for that is November 2024). The previous OpenMP update report listed some of the major new features that were planned for TR11. I will summarize finalized list here: * New APIs to enable allocation of memory that is accessible from multiple devices * A new groupprivate directive, mainly useful for specifying a private copy of a global/static variable should be created for each "thread group" (e.g. for NVIDIA devices, this could result in creating a copy in fast, shared memory for each thread block). * Loop transformation directives for interchange and reverse operations * Clarification on forward progress for threads created by parallel constructs, and a "safesync" clause for ensuring certain threads in the team can safely synchronize when they're diverged and still make progess. This is useful for implementations that map OpenMP threads to the same warp on a GPU (all OpenMP compilers that support target offload to GPUs to do this, to my knowledge) * New C++ "decl" attribute is available as alternative to using certain declarative pragma directives. Example: int x [[omp::decl(threadprivate)]]; instead of: int x; #pragma omp threadprivate(x) * Support for mapping assumed-size arrays * "strict" modifier for the num_threads clause, requires implementation to provide exactly the number of threads requested for a parallel region or error out. * Extended syntax for setting the default device or the available devices based according to device type * Support for system-scoped atomics, with the memscope clause * An apply clause was added to loop transformation directives (currently tile and unroll) for specifying additional loop transformations on the resulting loops. This gives the programmer control over the sequence of loop transformations to perform on a given loop nest. We had a face-to-face meeting during the first week of February. The following topics were discussed: * Generalized Inductions: A new induction clause will be added to various loop directives for specifying "induction" variables. The induction clause is more general than the existing linear clause. It supports additional and multiplication as predefined induction operations, but also enables custom induction operations to be declared for user-defined types via a 'declare induction' directive (using a syntax similar to 'declare reduction'). * Unstructured Parallelism: There is a plan to distinguish "structured parallelism" (obtains through the use of parallel constructs) from "unstructured parallelism". For unstructured parallelism, the programmer creates tasks marked as "free agent" outside of any parallel construct, and the implementation will use threads available in an existing thread pool to execute them. * Taskloop Dependences: The taskloop construct can be used to partition the iterations of the loop into tasks that can be scheduled onto available threads. We are planning to add support for allowing dependences to be expressed for these taskloop-generated tasks. This would allow both ordering among tasks generated for the taskloop, as well as ordering with respect to tasks generated outside the taskloop that write to or read from the same variables. * Default Sharing declaration: Some constructs accept a default clause, which may be used to specify an implicit data-sharing attribute (or lack thereof) for variables that aren't explicitly scoped by a clause on the construct (e.g., default(shared) or default(none)). We are considering a declarative directive for setting the implicit data-sharing attribute to use for a given variable. So, it would be possible to say that a variable X should be implicitly firstprivate if it doesn't explicitly appear in a clause. * Improving Mapping Behavior for Structures: OpenMP has some fairly complicated rules and restrictions for mapping structures or parts of structures to a device. We are considering ways to lift some of the existing restrictions to allow one to map different parts of a structure on different constructs. * Error Model: The severity and message clauses on the error directive were added to the parallel construct to enable an implementation to emit a particular error message if an error occurs while setting up the parallel region. But we are exploring a solution for more general error handling on OpenMP constructs. * Task Graphs: We are considering extensions for generating a task graph consisting only of inter-dependent tasks that may be saved and reexecuted multiple times. Experiments showed that this can reduce runtime overhead that is usually associated with setting up and tracking data dependences between tasks as they are created.