J3/99-250 Date: 4 November 1999 To: J3 From: William B. Clodius Subject: Reallocation 1. Introduction One common request for Fortran standardization is the ability to "reallocate" an array: i.e., the ability to dynamically change the extent of an array, while retaining, in one form or the other, the values and definition status of various elements of the array. This capability can be implemented in standard Fortran 95, but requires several lines of code specialized for each data type and rank of array, for what can be implemented as a single standard defined statement or procedure. From the users point of view there are three gains from the availability of a REALLOCATE statement: a decrease in the number of lines of code, with a corresponding decrease in errors and increase in programmer's efficiency; a standard notation for a common operation, with a corresponding increase in program legibility; and the use of a standard notation for a common operation should facilitate compiler optimizations for this idiom. In order to decide whether this capability should be standardized the committee needs to consider how this capability would be used, how often it would be used, how to maximize its safety, and how it would be implemented 2. Reallocation use Reallocation is in many ways an array based language substitute for strongly typed linked lists. They are almost invariably used in contexts where the final data size is initially unknown, but the type of data is known, i.e., in reading a stream of data stored in a data file. Reallocation allows the user to increase the size of the array as needed. The majority, perhaps the vast majority, of uses are satisfied by allocation in one dimension. However, a few uses would benefit from being able to change multiple dimensions at once, in essence a subroutine version of the intrinsic function RESHAPE where the SOURCE argument to RESHAPE becomes an INTENT IN OUT argument that, subsequent to the execution of the subroutine, is defined similar to the result of RESHAPE. Some uses of reallocation require that the common elements of the array before and after reallocation map to one another have a special mapping, i.e., either by array index or by subscript order value. While these two mappings are identical if only the upper bound of the final index changes, they are not identical in other cases. The vast majority of applications that could benefit from changing other array bounds on reallocation require that a mapping by array indices. Some such applications, however, can benefit by mapping according to subscript order value. Still in Fortran the natural dimension for one dimensional reallocation is also the final dimension, and for the majority of uses a one dimensional reallocation is the primary need so it is possible to provide this capability in a useful form without making the distinction between the two mappings if reallocation is confined to the upper bound of the final dimension. >From the demand for this capability I suspect that it would receive wider use than many of the intrinsic procedures added in Fortran 95. 3. Implementation issues Discussions of reallocation often focus on the issue of efficiency. From an implementor's point of view there are two classes of operations that are associated with this type of operation: 1. Acquiring more memory from the system either to hold the array if it has grown larger, or to hold temporaries during copying operations 2. Copying the array elements from the old array definition to the new array definition. Provided the reallocation maintains subscript order mapping both costs can be minimized by initially over allocating, either by the processor or the user. However, it is difficult for the processor to efficiently over allocate if reallocation is to maintain index mapping, if index mapping is required minimizing these costs therefore requires that the user explicitly over allocate. >From a user's point of view there are at least two issues with reallocation implementation: 1. Not accidentally choosing a difficult to optimize data structure access, when an easy to optimize form is available. The greatest efficiency is obtained when copying is minimized, i.e., when reallocation retains subscript order values. As most users appear to want to reallocate only in one dimension, but retain array index mapping, this implies doing the reallocations so that only the bounds of the rightmost dimension will change. 2. Provide a hint (i.e., optional keyword) to the processor that over allocation may be useful, or that over allocated memory might be freed for use elsewhere. 4. Safety issues There is one major safety issue with regard to reallocation. The interaction of Fortran's pointers with reallocation is invalid a large variety of special cases, i.e., pointers to array sections, to arrays without the allocatable attribute, to dummy arguments, etc, and is difficult to define in other cases, what is the effect of reallocating a pointer to a named allocatable array. As a result I suggest that reallocation be forbidden for pointers and that pointer association become undefined if a pointer's target is reallocated. This is much less of a constraint for Fortran 200x, with its increased flexibility for allocatable arrays, then it would be earlier standards. A minor safety issue is the effect of reallocation on array lower bounds. I believe it is extremely rare for a user to want to change the lower bound on reallocation. However in other related contexts, i.e., allocation, the default is to set the lower bound to one. If this semantics is allowed in reallocation, reallocating arrays with lower bounds other than one may have surprising results. 5. Summary Many of the issues that have prevented adoption of reallocation in the past can be avoided, at the cost of reduced flexibility, by allowing reallocation to affect only the extent of the final dimension of an array and not allowing reallocation of pointers. The issues could also be addressed, at a cost in syntactic and implementation complexity, with corresponding language flexibility, by employing appropriate optional keywords. Additional optional keywords may also be useful as hints to the compiler. If this proposal is accepted for consideration by J3 the following issues should be addressed: 1. Should the language provide reallocation? 2. Should reallocation be confined to the rightmost dimension or should a more flexible reallocation capability be provided? 3. If a more flexible reallocation capability be provided should it: A. Map elements by index, or by array sequence order, or should the mapping be user selectable through an (optional?) keyword? B. Provide an alternative simpler syntax for changing the extent of the rightmost dimension? C. Allow changing the lower bounds of the array? 4. Should a keyword be provided for the ALLOCATE statement to indicate that the array may be subsequently reallocated so that over allocation may be useful, or for the REALLOCATE statement to indicate that no more reallocations may be expected so that over allocated memory may be freed? 5. Should reallocation of pointers be considered?