To: J3 12-136 From: Bill Long Subject: Coarray TS features Date: 2012 February 15 References: 09-184, 11-256r2, N1858, N1906 Feature recommendations for the proposed TS on coarray extensions. Included are a three-part rating for estimates of Importance/Usefulness (I) (1 = high .. 5 = low), Editing difficulty into the standard (S) (1 = easy .. 5 = difficult), Implementation difficulty (V) (1 = easy .. 5= difficult). Feature 1: Teams ---------------- Teams provide a capability to restrict the reach of remote memory references to a subset of all the images of the program. This simplifies code for some applications that involve segregated activities (parts of a climate model, for example). Teams also provide a mechanism for limiting activity to a subset of the computer system that might result in better performance within the team (such as within a local SMP domain.) OPTION A: Simple teams. [I:?, S:1, V:2] Teams are a collection of images. The functionality is the same as in N1858. Images are formed from a list of image numbers, there is no remapping of image numbers. Teams can be used in a SYNC TEAM statement or as an argument to a collective. OPTION B: Complicated teams [I:?, S:5, V:4] Teams are formed by splitting current teams. At program beginning, one team exists and includes all images. Team split requires participation of all images in the current parent team, and every team has to join one team or the other. The current team for a particular image is set with a WITH TEAM statement. WITH TEAM (ROW_TEAM) ! The current team is defined as ROW_TEAM. END WITH TEAM A particular image belongs to only one "current" team at a given time. The image numbers are renumbered relative to the current team. Execution of the END WITH TEAM statement causes the image to become a member of the team in effect just before the WITH TEAM statement was executed. Within a WITH TEAM block, statements such as SYNC ALL, ALLOCATE, DEALLOCATE, that the current standard describes as involving "all images" are instead applied to only the images that are in the current team. Complicated teams represent a significant modification of the current Fortran model, for both standard and for implementations. Question: Can images in team A access images in team B? If so, syntax is needed, and detailed work on modification of the base memory model needs investigation. Feature 2: Collectives ---------------------- Collective subroutines offer the possibility of substantially more efficient execution of reduction operations that would be possible by non-expert programmers. Corresponding routines are widely used in MPI programs. This feature has been the most widely requested in the discussion of coarrays. OPTION A: Collective with simple teams. Intrinsic Collective subroutines: CO_SUM CO_MAX CO_MIN CO_BCAST CO_REDUCE See 11-193 for detailed descriptions. Team argument is optional. OPTION B: Collectives with complicated teams. Same list as 2a, except that the TEAM argument is eliminated and the operation is over the current team. For either option, the operator for CO_REDUCE should be pure. Questions: In 11-193 the RESULT argument is optional. The interface is simpler if the RESULT argument is not optional or the RESULT argument is eliminated and rename SOURCE to ARG. The advantage of result being optional is to save space in the case that the SOURCE and RESULT are large, or to allow for both of the cases of desiring to overwrite the source or preserve it. Should RESULT be required to be present if RESULT_IMAGE is present? The current semantics for an absent RESULT involves overwriting the SOURCE even on the images that are not RESULT_IMAGE. For simple teams, should the TEAM argument be the last argument? Feature 3: Atomic operations. ------------------------------ Intrinsic subroutines implementing atomic memory operations provide the capability to write simple and efficient code for several common operations used in parallel programs. These capablities are provided in other parallel programming models and have been available as extensions in some Fortran implementations (though with different syntax). A standard specification would improve the prospects for code portability. All of the proposed subroutines are intrinsic atomic subroutines, expanding the Fortran 2008 list of ATOMIC_DEFINE and ATOMIC_REF. Paper 09-184 provides a detailed description of ATOMIC_CAS. The other routines have similar semantics. The ATOM argument is atomically modified, based on the other arguments. If an optional OLD argument is present, it is assigned the value of ATOM immediately before the specified operation is performed. Base Set: ATOMIC_CAS (ATOM, OLD, COMPARE, NEW) ! Compare-and-swap ATOMIC_ADD (ATOM, VALUE [,OLD]) ! Atomic integer add ATOMIC_AND (ATOM, VALUE [,OLD]) ! Atomic bitwise AND ATOMIC_OR (ATOM, VALUE [,OLD]) ! Atomic bitwise OR ATOMIC_XOR (ATOM, VALUE [,OLD]) ! Atomic bitwise exclusive OR Extended set: ATOMIC_SWAP (ATOM, VALUE) ! Atomic swap ATOMIC_AX (ATOM, MASK, VALUE [,OLD])! Atomic bitwise and-xor Hardware support for the Base Set is available on the widely used processors and also in some commodity and proprietary network interfaces. On most systems, the Base Set should be very easy to implement. Most systems include hardware support for either atomic swap or atomic and-xor, but sometimes not both. The atomic and-xor operation replaces ATOM with ieor(iand(ATOM,MASK),VALUE). This can be used to implement an atomic swap if the OLD argument is specified and MASK=0. The atomic and-xor operation also provides the capability to perform atomic definitions of a subset of the bits of ATOM. Integrating this feature into the Fortran standard should be straightforward. New intrinsic procedures are added. Implications for the memory model semantics are already specified for execution of an atomic subroutine. Feature 4: Events ----------------- Events provide a capability for an image to signal another image which can then detect that the event has been posted. This replaces the NOTIFY/QUERY feature of N1858. It is superior to the old feature because the events associated with user variable, so multiple events can exist for an image, and a library routine could establish events internally that will not interfere with notifications that might be occurring on an image separate from the library code. Terminology: An "event" is an abstraction of an issue of common interest to two or more images that is supposed to "occur" at a moment the "interested" images are "active". An image can "notify" other image(s) of the occurance of an event. An image can "query" for notification of the occurance of an event. Functionality offered: 1. An image can notify other image(s) of the occurrence of an event. 2. An image can query the status of the event (waiting for it to be notified). 3. (Optional) Allow multiple images to query for notification of a single event (for instance, by being "queued"). Syntax: EVENT NOTIFY(EVENT,[SQNR]) ! Notify the occurrence of EVENT (optionally, ! to SQNR in the queue of waiting images). EVENT QUERY (EVENT,[SQNR]) ! Query the status of EVENT (optionally, ! returning SQNR of the notified image). Semantics: EVENT is a data item accessible on all images interested in being notified of the occurrence of the associated event. EVENT is established as being a data item describing an event by the execution of EVENT NOTIFY on the image that wants to signal the occurrence of EVENT and by EVENT QUERY on the image(s) that want to wait for EVENT to have occurred. Challenges/Questions: It is an open question whether it is necessary to have a separate type for EVENTs; A scalar coarray of type integer might be sufficient. The dynamical scope of EVENT must be such that it "exists" before, during and after all NOTIFY's and QUERY's are/have been processed. Do we want to allow Option 3 above - permitting multiple postings of the same event? What semantics result - do multiple postings require the same number of queries? Feature 5: Removing restrictions on coarray components. ------------------------------------------------------- The current restrictions on coarray components result in a poor integration of coarrays with the object oriented programming features of Fortran. The restriction in C432 that a coarray component cannot be added to a type through type extension unless the parent type has a coarray component significantly limits the use of coarray components. It would prohibit using a base type that had no components from ever having a coarray component added, for example. The restriction in C444 that the parent of a coarray component cannot be a coarray limited the declaration of some variables as a coarray. Since coarray components are required to be allocatable (C442) it allocated memory would not be part of the parent structure. I can (probably would) be allocated from the same pool of memory as noncomponent coarrays. As such the above restrictions seem unnecessary. Compiler changes required for implementation of this feature should not be difficult. Integrating this feature would require modification or deletion of several constraints and modification of example code and at least one Note in the standard. There are possibly side-effects of these changes. Checks needed for C432, C444, C526, C557, C620. C640, C643, and Note 4.30. Feature 6: Copointers --------------------- Some algorithms involve large data structures, such as graphs or linked lists, that span many images. These codes would benefit from a pointer-like object for which a remote target is allowed. Copointers provide a mechanism for associating a pointer with a target on a different image. Association is allowed with a local target, a coindexed target. Like ordinary pointers, copointer assignment to another copointer results in association with the target of the other copointer. A method is provided to determine the image number of the target of a copointer. Copointer references include an empty [] to signal a potentially remote reference. Copointers are allowed as components. If the parent object is a coarray, it is possible to associate a remote copointer with a target. Issues: The interaction of copointers with the existing memory model is potentially nontrivial. How copointers can be used in an ASSOCIATE construct needs to be specified. Is ti possible to specify initialization for a copointer? A substantial example code illustrating copointers is needed. Feature 7: Parallel I/O ----------------------- The OPEN statement is extended to allow a file to be opened on set of images. If teams are available, a TEAM= specifier is added to the OPEN statement to specify the team that is the connect team for the external unit. In the absence of teams, a new specifier is added to specify the connect team for the unit. The unit number references the same file on all images in the connect set. All images in a connect team shall execute an OPEN statement with the same connect-specs except for the ERR=, IOMSG=, IOSTAT=, and NEWUNIT= specifiers. The OPEN statement acts as an implicit image synchronization for the images in the connect team. The values permitted for the ACCESS= specifier in an OPEN statement for a connect team containing more than one image shall depend on which of the options below are selected. If an image executes a CLOSE statement on a unit, all images in the unit's connect team shall close the unit with the same file disposition. The CLOSE statement performs a an implicit image synchronization for the images in the connect team. The FLUSH statement shall make data written to an external unit available to all images in the unit's connect team which execute a FLUSH statement for that unit in a subsequent segment. If teams are available, a TEAM= specifier shall be added to the INQUIRE statement. Options A and B are not exclusive of each other. Option A: Direct-access (I:2, S:2, V:3) A file can be connected for direct-access in an OPEN statement with a connection team that contains more than one image. The NEXT_REC= specifier in an INQUIRE statement executed by an image shall be assigned the value n + 1, where n is the record number of the last record read or written by the image Option B: Sequential-access (output only) (I:4, S:3, V:4) A file can be connected for sequential-access in an OPEN statement with a connection team that contains more than one image. The ACTION= specifier in the OPEN statement shall have the value WRITE. The processor shall ensure that once an image starts to write a record to a unit, no other image shall write to the same unit until the complete record has been written. An image shall not execute the file positioning statements BACKSPACE, ENDFILE, and REWIND on a unit whose connect team contains more than one image. Feature 8: Asymmetric allocation. --------------------------------- If the Complicated Teams feature is adopted, this feature is less important since allocation is done only on images of the current team. It is also possible to have different size allocations on different images by employing components of coarray structures.