J3/06-180 Date: 08-May-2006 To: J3 From: Bill Long Subject: Co-arrays to be required In paper 06-165 the authors contend that the co-array proposal, UK-001, has significant problems and should be an optional, rather than mandatory, part of the standard. This paper presents arguments to the contrary. Background ---------- 1) Fortran has remained an active language for 50 years by repeatedly adapting to evolving hardware and software environments, while maintaining its primary strengths of ease of use for scientific and mathematical programming and efficient program execution. 2) The primary shift occurring now in computer hardware is toward increased parallelism. Over half of the processor chips to be shipped this year will support multiple independent threads of execution. By 2010, the time frame of relevance for initial shipments of compilers for Fortran 2008, there will likely be zero single-threaded chips made for general purpose computers. 3) Effective programming of parallel systems has proved difficult with the most widely available tools, OpenMP and MPI. OpenMP scales poorly and MPI is difficult to use and poorly integrated with Fortran concepts. DARPA has recognized the fundamental problem of poor productivity in programming modern systems and is now conducting a competition for the development of a new language. Of the three remaining candidates, all are explicitly designed for parallel execution environments. 4) Co-arrays as part of the Fortran language provide a dramatic improvement in programmer productivity compared to MPI. As an adaptation to hardware changes, as well as an easy to use and easy to understand model, co-arrays fit squarely into the natural evolution of Fortran. Because of impressive experience with existing implementations, both the US departments of Defense and Energy are strongly encouraging the standardization and use of Fortran with co-arrays as well as UPC, the C extension with essentially the same parallel programming model. (See upc.lbl.gov for more information on UPC.) 5) With the public release of the feature list for Fortran 2008, listing co-arrays as the major feature of the required sublist, J3 and WG5 have received enthusiastic congratulations from many high profile user groups, especially ones for whom Fortran is still the principal programming language. The major complaint about co-arrays from the user community is the lack of code portability. The central goal of language standardization is to ensure that portability, which is why the inclusion of co-arrays as a nonoptional part of Fortran is of such high importance. Because of its inclusion in Fortran 2008, some customers are already requesting implementations in the 2007-2008 time frame, and (at least in the Opteron space) compiler vendors are starting to commit to this timetable. If we fail to meet these user's and vendor's expectations, the credibility of, and support for, J3 and WG5 will be severely damaged. There are two potential results. One is that implementation of co-arrays is abandoned by most vendors and the users migrate future codes to UPC, which is already more widely available. The other is repeat of "Cray pointers" on a large scale, with J3 becoming irrelevant and the de-facto definition of Fortran being taken over by a subset of the vendors and US government agencies. Neither outcome is desirable. Specific issues raised in paper 06-165 -------------------------------------- Lines beginning with >> are taken from 06-165. >>- the conceptual model is restricted to a small class of parallel >> machines All systems generally fall into one of these four groups: 1) One single-threaded processor and associated memory. On such a system, the implementation if co-arrays becomes trivial because there is only one image; parallel execution is a non-issue. The compiler is required only to accept the new syntax and diagnose constraint violations. The generated code can ignore the image indices since either they all have the value 1 or the program has an error. All of the synchronization routines become trivial. However, as noted above, such systems will be essentially extinct by 2010. 2) Multiple processor threads sharing a common memory space. On such a system, the compiler can partition the user's memory such that each image and its associated processor thread has one of the blocks. Alternatively, each co-array could be separately partitioned. Accessing memory in other images amounts to trivial address manipulation followed by ordinary load and store operations. Implementation of the synchronization statements will be no more difficult than, and can make use of, the similar technology used for OpenMP. The conceptual model maps easily to this architecture, and the implementation is simple and efficient. 3) Systems with distributed memory, and for which each processor or small group of processors has fast access to its local memory, but also has access to memory in other parts of the system without interrupting the processors for which that memory is local. The ability to support remote memory access is provided by a high-capability network integrated into the system. The co-array model is easily adapted to this class of systems, as proven by existing implementations. 4) Systems with distributed memory, but for which the network does not support direct remote memory access. This is typical of the low-cost cluster systems. Features have been added to the co-array proposal specifically to address performance issues on such systems. In summary, all systems that support efficient parallel processing work well with co-arrays, and users of those systems with poor parallel characteristics still benefit from ease of use. Furthermore, it is reasonable to assume that system performance will continue to improve in the future. >>- the I/O model is overly complicated The proposal includes only the simplest extensions for parallel I/O. Suggestions for expansion in this area have been resisted. Noncolliding read and write operations to records of a direct access file connected to multiple images are allowed. Also allowed is writing to a sequential access file connected to multiple images. Both of these are in direct response to the expressed needs of users. Users frequently assume that writing to a sequential file from the processes in a parallel job coded with Fortran and MPI should work in the "obvious" way. Whether this works is highly nonportable. Parallel I/O very much needs standardization, at least for the simple cases included in the co-array proposal. >>- the synchronization model is overly complicated and too low-level For many applications, sync_all, sync_images, and sync_memory are sufficient. The notify and query statements, while simulatable with user written code, were included for user convenience. The sync_team statement was included to allow for special optimization of that particular case. The goal is to provide both the simplest synchronization, sync_all, for simple situations, while also providing more powerful methods for more complicated situations. None of the included synchronization capabilities is new or untested technology. >>- significant further technical work on the proposal is still needed so there is a serious risk of delaying the whole revision schedule Significant further technical work has been completed between meetings 175 and 176. It is our belief that the feared "serious risk" no longer exists. >>- the co-array model is presented as the normal execution model with single processors as the exception; this is the wrong emphasis given that much of the new language is irrelevant to most users There is only one execution model presented. There is no "abnormal" one available. Whether the program is executing on one image or multiple images is irrelevant to most of the standard. Where it does matter, is there any other sensible presentation for a unified language? We do not, for example, call out special cases of arrays with only one element as "normal" (and can be handled as if they were scalars), and then distinguish the "abnormal" case of larger arrays that need different handling. While the concept of parallel execution might seem "irrelevant to most users" today, there is very little reason to expect that will be the case by 2010. >>- the proposals as currently formulated are large and experimental and should be evaluated in practice before being standardized The proposal differs only minimally from implementations that have spanned two different systems over a period of 10 years. The reason there is such high user enthusiasm for the standardization of co-arrays is because of that extensive and positive experience. Co-arrays are not even remotely experimental. >>We believe the facts that co-arrays are of minority interest and that >>these proposals in detail are untested in practice point strongly to >>their being an optional, rather than a mandatory, part of the Fortran >>standard. The "minority interest" will grow rapidly to a majority with available implementations, and even at today's level exceeds the interest in some other features incorporated into Fortran. Co-arrays have been tested extensively in practice and proven valuable. In that both of the premises appear to be false, the conclusion should be rejected. >>We believe that the best way to do this would be for >>co-arrays to appear as a Technical Report (without necessarily >>guaranteeing inclusion in the next standard). Failure to mandate co-arrays defeats the central point and value of the standard. It would be a recipe for failure of the feature, and the irrelevance of the whole language in the long run. >>This would allow those >>vendors whose primary markets are for single processors to avoid the >>burden of implementing features of little interest to their customers. If vendors find a continuing market for single processor compilers, the burden to implement co-arrays in such a compiler is small. >>We urge J3 to proceed on these lines and to so recommend to WG5. On the contrary, finishing the work remaining and completing Fortran 2008 on schedule is the right course of action.