To: J3 10-109 From: John Reid Subject: Comment GB-6 Error Termination (Subclause 2.3.5) Date: 2010 February 01 References: WG5/N1802, WG5/N1803, J3/09-007r3 DISCUSSION The current wording overspecifies error termination in paragraph 4 of 2.3.5 (Execution sequence). Specifically, requiring one image to be able to force others into synchronisation without them executing any special statement is a heavy burden on implementors, and may not always be feasible. The current wording specifies that, if image 1 executes an ALL STOP statement, and image N is not responding, image 1 should NOT proceed to termination and close all of its output files, but should simply hang. Also, the current specification makes it impossible for a Fortran compiler to implement coarrays on a basis of MPI. The first paragraph of the specification of MPI_Abort is: This routine makes a "best attempt" to abort all tasks in the group of comm. This function does not require that the invoking environment take any action with the error code. However, a Unix or POSIX environment should handle this as a return errorcode from the main program. And the first paragraph of the specification of MPI_Finalize is: This routine cleans up all MPI state. Each process must call MPI_FINALIZE before it exits. Unless there has been a call to MPI_ABORT, each process must ensure that all pending non-blocking communications are (locally) complete before calling MPI FINALIZE. There is no requirement for MPI_Abort to attempt to synchronise with the other processes, and many implementations do not do so - they merely request the operating system or job scheduler to kill the MPI job. PROPOSAL This is to reorganise the order of the wording now in paragraphs 4 and 7 of 2.3.5, so as to specify the exact process for normal and error termination. The new paragraph 4 states that the intent of ALL STOP is to terminate the program immediately, but leaves it largely unspecified as to how. EDITS [33:24-29] In 2.3.5 "Execution sequence", replace paragraph 4: "Termination of execution of an image occurs in three steps: initiation, synchronization, and completion. All images synchronize execution at the second step so that no image starts the completion step until all images have finished the initiation step. Termination of execution of an image is either normal termination or error termination. An image that initiates normal termination also completes normal termination. An image that initiates error termination also completes error termination. The synchronization step is executed by all images. Termination of execution of the program occurs when all images have terminated execution." by: "Termination of execution of a program is either normal termination or error termination. Normal termination occurs only if all images initiate normal termination and occurs in three steps: initiation, synchronization, and completion. In this case, all images synchronize execution at the second step so that no image starts the completion step until all images have finished the initiation step. Error termination occurs if any image initiates error termination. Once error termination has been initiated on an image, error termination is initiated on all images that have not already initiated error termination. Termination of execution of the program occurs when all images have terminated execution." [34:2-3] In 2.3.5 "Execution sequence", delete paragraph 7: "If an image initiates error termination, all other images that have not already initiated termination initiate error termination."