From: R. Bader (Bader_AT_lrz_DOT_de) To: J3 08-294 Subject: Coarray access sequence Date: 2008 November 07 References: J3/08-007r2, WG5/N1753, WG5/N1754, WG5/N1756 This paper aims to solve the problem discussed in section B. of N1754, along the lines indicated in N1756. The relevant section of N1756 is reproduced here to motivate the edits to the standard that follow. The example program in section D. of N1754 is a bit of a red herring since deadlock situations with companion processors or I/O may also occur if only two images are involved. The author has however, in my opinion, hit on a bug in the specification that must be fixed in 8.5.1 para6. In the first bullet of para6, the following situations are covered: ---------------------------------------------------------------- Legend: P, Q, ... are image numbers a is a coarray S(XY) is a pairwise sync which induces segment ordering for the two involved images. Time goes downward ... whatever that means. ---------------------------------------------------------------- P Q | | | | a = ... S(PQ) ~~~~~~~~~~~~~~~ RAW | | ... = a[Q] |<------------| and further diagrams covering WAW (push instead of pull by P), WAR. What is however not covered correctly are the cases in the lower part of the following diagram: P Q R | | | | ... = a[R] |<-----------| S(PQ) ~~~~~~~~~~~~~~~ | WAR (R "passive") | | | a[R] = ... |------------------------->| | | | S(PR) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | S(QR) ~~~~~~~~~~~~~~ | | | | ... = a[R] |<-----------| RAW (R "passive") | | | (or WAW, R "passive", not shown) Indeed it appears the present formulation of the draft standard requires S(PQ) twice in the above diagram, making it look like a one-sided MPI call with a passive partner (not implemented e.g., in MPICH2 or Intel MPI for good reason). In my opinion, R should only be passive with respect to references to a, but not with respect to requiring a sync images (/P,Q,R/) (as drawn in the diagram) for the RAW case, as images P and Q do. However, for WAR indeed only S(PQ) is needed. So I think it is appropriate to introduce the concept of an image being "owner" of a coarray and changing the first bullet to cover all conceivable transactions with the owner. If this is done, one obtains 16 elementary access sequences, of which 12 involve updates. Of these, 9 involve image communication, and of these again, the above 3 need special treatment. Edits to 08-007r2: ~~~~~~~~~~~~~~~~~~ [35] add two new paragraphs to 2.5.7: 4+1 A coarray update may happen on either the image owning it (Uo), or a remote image (Ur). Coarray updates comprise * the definition or undefinition of a coarray variable either on the image owning it (Uo), or a remote image (Ur). * the allocation of an allocatable subobject of a coarray or pointer association of a pointer subobject of a coarray by the image owning it (Uo). Note 2.17+ The definition or undefinition by a remote image of a noncoarray dummy argument whose effective argument is a coarray is not counted as a coarray update. To prevent (existing) noncoarray code from breaking, this case receives special treatment with respect to image execution (8.5.1). 4+2 A coarray reference from the image owning it is denoted by (Ro); a coarray reference from any other image is denoted by (Rr). [32] add a new section: 2.4.6 Coarray access sequence 1 During execution of all images of a program, the sequence of all references or definitions of a coarray entity owned by a particular image (2.5.7) is composed of (possibly overlapping) elementary access sequences. 2 There exist 16 elementary access sequences: * (Ro) after (Ro), (Ro) after (Rr), (Rr) after (Ro), (Rr) after (Rr) * (Ro) after (Uo), (Ro) after (Ur), (Rr) after (Uo), (Rr) after (Ur) * (Uo) after (Ro), (Uo) after (Rr), (Ur) after (Uo), (Ur) after (Rr) * (Uo) after (Uo), (Uo) after (Ur), (Ur) after (Uo), (Ur) after (Ur) [188] replace para6 of 8.5.1 by 6 For all elementary access sequences of which neither uses the GET_ATOMIC or SET_ATOMIC intrinsics, (C848+1) the segment of the image owning the coarray shall be ordered with respect to the segment of the remote image referencing the coarray for the access sequences (Ro) after (Ur), (Rr) after (Uo), (Uo) after (Rr), (Ur) after (Uo), (Uo) after (Ur), (Ur) after (Uo). (C848+2) the segment of both remote images shall be ordered with respect to the image owning the coarray for the access sequences (Rr) after (Ur), (Ur) after (Ur). If the remote images are not the same image, the segments of the remote images shall also be ordered with respect to one another. (C848+3) the segment of two different remote images shall be ordered with respect to one another for the access sequence (Ur) after (Rr). 7 For all elementary access sequences by two different images which contain at least one coarray update, and of which exactly one image executes a GET_ATOMIC or SET_ATOMIC intrinsic on the coarray, (C848+4) the segments of the two involved images shall be ordered with respect to one another. 8 For all elementary access sequences by two different images which contain at least one coarray update, and for which at least one image executes a procedure in segments Pi, Pi+1 , ..., Pk for which the coarray is an effective argument associated with a non-coarray dummy argument, (C848+5) the coarray shall not be referenced, defined, or become undefined on the other image Q in a segment Qj unless Qj precedes Pi or succeeds Pk. Note 8.30: Any prescribed image ordering must be consistent with the semantics of the elementary access sequence it is associated with. Note 8.31 The processor may optimize the execution of a segment as if it were the only image in execution. In particular, this applies to elementary access sequences which do not involve updates, and to elementary access sequences which contain references and/or updates by the owning image only. Note 8.31+1 The model upon which the interpretation of a program is based is that there is a permanent memory location for each coarray and that all images can access it. In practice, an image may make a copy of a coarray (in cache or a register, for example) and, as an optimization, defer copying a changed value back to the permanent location while it is still being used. It is safe to defer this transfer until the end of the current segment and thereafter to reload from permanent memory any coarray that was not defined within the segment. It would not be safe to defer these actions beyond the end of the current segment since another image might reference the variable then. Note 8.31+2 The incorrect sequencing of image control statements can suspend execution indefinitely. For example, one image might be executing a SYNC ALL statement while another is executing an ALLOCATE statement for a coarray. Further comment: ~~~~~~~~~~~~~~~~ The above edits assume that VOLATILE coarrays have been disallowed, and be replaced by the atomic statements or intrinsics suggested in N1753.