To: J3 12-178 From: Bill Long Subject: Coarray collectives Date: 2012 October 3 Reference: WG5/N1930, 12-170 Discussion: ----------- This paper proposes syntax, semantics, and edits for feature 3. Collectives in N1930: A collective subroutine is an intrinsic subroutine that is executed by a set of images. It performs a computation based on values on the images of the set. Collective subroutines offer the possibility of substantially more efficient execution of reduction operations than would be possible by non-expert programmers. Corresponding routines are widely used in MPI programs. C1: A call to a collective subroutine is not an image control statement. However, such a call shall appear only in a context that allows an image control statement. Even though calls to collective subroutines involve internal synchronization required by the usual rules for reference and definition of subroutine arguments, they do not facilitate ordering of segments. C2: If a collective subroutine is invoked on one image, it shall be invoked by the same statement on all images of the current team. C3: A collective subroutine based on a user-written procedure that applies the required operation to local variables shall be provided. In addition, because they are often needed, there should be specific collective subroutines for SUM, MAX, and MIN for intrinsic types for which the corresponding operations are defined. Forms that provide the result to just one image or to all the images involved should be provided. Beyond this, there should be a collective subroutine that broadcasts a value on one image to a set of images. Coindexed source and result arguments are not permitted. The syntax, semantics, and edits assume the Teams proposal in N1930. Syntax and Semantics -------------------- A collective subroutine is one that is invoked on each image of the current team to perform a calculation on those images and which assigns the value of the result on all of them or one of them. If it is invoked by one image, it shall be invoked by the same statement on all images of the current team. Each actual argument to a collective subroutine shall have the same bounds, cobounds, and type parameters as the corresponding actual argument on any other image of the current team. The SOURCE and RESULT arguments shall not be coindexed. On any two images calling a collective subroutine, the ultimate arguments for a coarray dummy argument that is an actual argument to the collective subroutine shall be corresponding coarrays as described in 2.4.7 of ISO/IEC 1539-1:2010. Five collective subroutines are provided: CO_BROADCAST (SOURCE, SOURCE_IMAGE) - a collective broadcast CO_MAX (SOURCE [,RESULT, RESULT_IMAGE) - maximum value CO_MIN (SOURCE [,RESULT, RESULT_IMAGE) - minimum value CO_REDUCE (SOURCE, OPERATION [,RESULT, RESULT_IMAGE) - reduction by a user-specified function CO_SUM (SOURCE [,RESULT, RESULT_IMAGE) - SUM reduction CO_BROADCAST copies the value of SOURCE on image SOURCE_IMAGE to the corresponding SOURCE in the calls on all of the other images in the current team. The SOURCE and RESULT arguments may be scalars or arrays. If the SOURCE argument is an array, the reduction or broadcast is done separately for each element of the array, broadside across the current team of images. The SOURCE and RESULT arguments of CO_MAX, CO_MIN, CO_REDUCE, or CO_SUM may be coarrays, but that is not required. If the RESULT argument is present the result of the computation is assigned to RESULT; otherwise the result of the computation is assigned to SOURCE. If the RESULT_IMAGE argument is present the result of the computation is assigned only on image number RESULT_IMAGE; otherwise the result is assigned on all images of the current team. Edits to J3/12-170: -------------------- [5:5-6] In 3.1 replace the placeholder text with "collective subroutine intrinsic subroutine that is invoked on the current team of images to perform a calculation on those images and assign the value of the result on one of all of them (7.2)" [13:3-5] In 7.1 replace the first two sentences with "Detailed specifications of the generic intrinsic subroutines ATOMIC_ADD, ATOMIC_AND, ATOMIC_CAS, ATOMIC_OR, ATOMIC XOR, CO_BROADCAST, CO_MAX, CO_MIN, CO_REDUCE, and CO_SUM are provided in 7.3." [13:10] In 7.2 replace the placeholder text with "A collective subroutine is one that is invoked on each image of the current team to perform a calculation on those images and which assigns the value of the result on all of them or one of them. If it is invoked by one image, it shall be invoked by the same statement on all images of the current team. Each actual argument to a collective subroutine shall have the same bounds, cobounds, and type parameters as the corresponding actual argument on any other image of the current team. The SOURCE and RESULT arguments shall not be coindexed. On any two images calling a collective subroutine, the ultimate arguments for a coarray dummy argument that is an actual argument to the collective subroutine shall be corresponding coarrays as described in 2.4.7 of ISO/IEC 1539-1:2010." [15:8+] At the end of 7.3 add new subclauses: "7.3.6 CO_BROADCAST (SOURCE, SOURCE_IMAGE) Description. Broadcast of a value to all images of the current team. Class. Collective subroutine. Arguments. SOURCE shall be a coarray. It is an INTENT(INOUT) argument. SOURCE becomes defined on all images of the current team with the value of SOURCE on image SOURCE_IMAGE. SOURCE_IMAGE shall be type integer. It is an INTENT(IN) argument. Its value shall be the image index of one of the images in the current team. Example. If SOURCE is the array [1, 5, 3] on image one, after execution of CALL CO_BCAST(SOURCE,1) the value of SOURCE on all images of the current team is [1, 5, 3]. 7.3.7 CO_MAX (SOURCE [, RESULT, RESULT_IMAGE]) Description. Maximum value of elements on the current team of images. Class. Collective subroutine. Arguments. SOURCE shall be of type integer, real, or character. It is an INTENT(INOUT) argument. If it is a scalar, the computation result is equal to the maximum value of SOURCE on all images of the current team. If it is an array, the value of the computation result is equal to the maximum value of all the corresponding elements of SOURCE on the images of the current team. If RESULT is not present, value of the computation result is assigned to SOURCE. If REULT is present, SOURCE is not modified. RESULT (optional) shall be of the same type, type parameters, and shape as SOURCE. It is an INTENT(OUT) argument. If RESULT is present, the value of the computation result is assigned to RESULT. RESULT_IMAGE (optional) shall be type integer. It is an INTENT(IN) argument. Its value shall be the image number of one of the images in the current team. If RESULT_IMAGE is present and RESULT is present, the result of the computation is assigned to RESULT on image RESULT_IMAGE and RESULT on all other images becomes undefined. If RESULT_IMAGE is present and RESULT is not present, the result of the computation is assigned to SOURCE on image RESULT_IMAGE and SOURCE on all other images becomes undefined. Example. If the number of images in the current team is two and SOURCE is the array [1, 5, 3] on one image and [4, 1, 6] on the other image, the value of RESULT after executing the statement CALL CO_MAX(SOURCE, RESULT) is [4, 5, 6] on both images. 7.3.8 CO_MIN (SOURCE [, RESULT, RESULT_IMAGE]) Description. Minimum value of elements on the current team of images. Class. Collective subroutine. Arguments. SOURCE shall be of type integer, real, or character. It is an INTENT(INOUT) argument. If it is a scalar, the computation result is equal to the minimum value of SOURCE on all images of the current team. If it is an array, the value of the computation result is equal to the minimum value of all the corresponding elements of SOURCE on the images of the current team. If RESULT is not present, value of the computation result is assigned to SOURCE. If REULT is present, SOURCE is not modified. RESULT (optional) shall be of the same type, type parameters, and shape as SOURCE. It is an INTENT(OUT) argument. If RESULT is present, the value of the computation result is assigned to RESULT. RESULT_IMAGE (optional) shall be type integer. It is an INTENT(IN) argument. Its value shall be the image number of one of the images in the current team. If RESULT_IMAGE is present and RESULT is present, the result of the computation is assigned to RESULT on image RESULT_IMAGE and RESULT on all other images becomes undefined. If RESULT_IMAGE is present and RESULT is not present, the result of the computation is assigned to SOURCE on image RESULT_IMAGE and SOURCE on all other images becomes undefined. Example. If the number of images in the current team is two and SOURCE is the array [1, 5, 3] on one image and [4, 1, 6] on the other image, the value of RESULT after executing the statement CALL CO_MIN(SOURCE, RESULT) is [1, 1, 3] on both images. 7.3.9 CO_REDUCE (SOURCE, OPERATION [, RESULT, RESULT_IMAGE]) Description. General reduction of elements on the current team of images. Class. Collective subroutine. Arguments. SOURCE is an INTENT(INOUT) argument. If it is a scalar, the computation result is the value that would be obtained by applying the function specified by the OPERATION argument to the results of applying the collective subroutine on a processor-dependent set of images that is part of the current team and another subset of images that is the rest of the current team. If it is an array, the computation result is equal to a processor-dependent and image-dependent application of the collective subroutine on all the corresponding elements of SOURCE on the images of the current team. If RESULT is not present, the value of the computation result is assigned to SOURCE. If RESULT is present, SOURCE is not modified. OPERATION shall be a pure function that defines the binary, commutative operation to be performed. The function shall have two arguments of the same type, type parameters, and rank as SOURCE. If they are arrays, they shall be of assumed shape. The result shall have the same type and type parameters as SOURCE and the same shape as the first argument. If the arguments are arrays, the result of applying the function to corresponding sections of two arrays shall be the same as the corresponding section of the result of applying the function to the wholes of the two arrays. RESULT (optional) shall be of the same type, type parameters, and shape as SOURCE. It is an INTENT(OUT) argument. If RESULT is present, the value of the computation result is assigned to RESULT. RESULT_IMAGE (optional) shall be type integer. It is an INTENT(IN) argument. Its value shall be the image number of one of the images in the current team. If RESULT_IMAGE is present and RESULT is present, the result of the computation is assigned to RESULT on image RESULT_IMAGE and RESULT on all other images becomes undefined. If RESULT_IMAGE is present and RESULT is not present, the result of the computation is assigned to SOURCE on image RESULT_IMAGE and SOURCE on all other images becomes undefined. Example. If the number of images in the current team is two and SOURCE is the array [1, 5, 3] on one image and [4, 1, 6] on the other image, and MyADD is a function that returns the sum of its two integer arguments, the value of RESULT after executing the statement CALL CO_REDUCE(SOURCE, MyADD, RESULT) is [5, 6, 9] on both images. 7.3.10 CO_SUM (SOURCE [, RESULT, RESULT_IMAGE]) Description. Sum elements on the current team of images. Class. Collective subroutine. Arguments. SOURCE shall be of numeric type. It is an INTENT(INOUT) argument. If it is a scalar, the computation result is equal to a processor-dependent and image-dependent approximation to the sum of the value of SOURCE on all images of the current team. If it is an array, the value of the computation result is equal to a processor-dependent and image-dependent approximation to the sum of all the corresponding elements of SOURCE on the images of the current team. If RESULT is not present, value of the computation the result is assigned to SOURCE. If RESULT is present, SOURCE is not modified. RESULT (optional) shall be of the same type, type parameters, and shape as SOURCE. It is an INTENT(OUT) argument. If RESULT is present, the value of the computation result is assigned to RESULT. RESULT_IMAGE (optional) shall be type integer. It is an INTENT(IN) argument. Its value shall be the image number of one of the images in the current team. If RESULT_IMAGE is present and RESULT is present, the result of the computation is assigned to RESULT on image RESULT_IMAGE and RESULT on all other images becomes undefined. If RESULT_IMAGE is present and RESULT is not present, the result of the computation is assigned to SOURCE on image RESULT_IMAGE and SOURCE on all other images becomes undefined. Example. If the number of images in the current team is two and SOURCE is the array [1, 5, 3] on one image and [4, 1, 6] on the other image, the value of RESULT after executing the statement CALL CO_SUM(SOURCE, RESULT) is [5, 6, 9] on both images." [17:10+] In 8.3 add a new edit to clause 13 "{Move the three paragraphs of subclause 7.2 in this Technical Specification to follow paragraph 3 and Note 13.1 in Subclause 13.1 of ISO/IEC 1539-1:2010.} [17:10++] In 8.3 add a new edit to clause 13 "{In 13.5 Standard generic intrinsic procedures, para 2} After the line "A indicates ... atomic subroutine" insert a new line: C indicates that the procedure is a collective subroutine" [17:17+] In 8.3 add these entries at the end of the list of modifications to Table 13.1 in 13.5 of the base standard} "CO_BROADCAST (SOURCE, SOURCE_IMAGE) C Broadcast of a value to all images. CO_MAX (SOURCE [, RESULT, C Maximum value of elements RESULT_IMAGE]) on all images. CO_MIN (SOURCE [, RESULT, C Minimum value of elements RESULT_IMAGE]) on all images. CO_REDUCE (SOURCE, OPERATION C General reduction of elements [, RESULT, on all images. RESULT_IMAGE]) CO_SUM (SOURCE [, RESULT, C Sum elements on all RESULT_IMAGE]) images." [17:18] In 8.3, second set of edit instructions, replace "7.3.5" with "7.3.10".