To: J3 J3/14-185 From: John Reid Subject: Stalled images Date: 2014 June 09 References: N2007, N2013 Discussion ---------- In my vote (N2013), I said "There has not been adequate discussion of an image stalling because of the failure of another image, which will cause it to fail despite there being nothing wrong with its hardware." I had in mind that executions of remote references such as a = b[i] will cause the executing image to stall if image i has failed. As we have it now, the processor has to regard the executing image as permanently failed. Other images that are accessing data from it will in turn stall and fail too. Thus one failure may lead to a large number of working images being treated as failed. This is not satisfactory. I made a suggestion in my vote for a change to the TS, but now think that there is a simpler way to handle it. The failure of image i essentially means that the team calculation has failed and the team needs to be reconstituted. My suggestion is that there be a transfer of control of the stalling image to the end of the team construct and that the image be regarded as failing only within the construct. After the synchronization at the END TEAM statement, it regains the status of a working image. For execution within the initial team, there is no END TEAM statement and the failure is permanent, as now. An access within a statement such as a = b[parent::i] to a failed image outside the current team should be treated as a failure within the ancestor team. The change will cause no alteration to A.1.2. The stalled images will be reused. Edits to N2007: --------------- [13:18] At the end of the sentence "A failed image remains failed for the remainder of the program execution" add "unless the failure occurs as described in the next paragraph". [13:21+] Add paragraph If an identifies an image that has failed and a team other than the initial team, the executing image is treated as a failed image for the rest of the execution of the corresponding CHANGE TEAM block. The executing image shall transfer control to the END TEAM statement of the construct. [13:21+] Change the last sentence to "The image selector {\tt b[i]} identifies the current team. The image selector {\tt b[i,team_id=1]} identifies the parent team. The image selector {\tt b[ancestor::i]} identifies the team {\tt ancestor}.