J3/03-255r1 Date: 14 November 2003 To: J3 From: Aleksandar Donev Subject: Post Fortran 2003: Movable arrays Response is in J3/03-277 ______________________________________ Summary ______________________________________ There is a need for providing a dynamic array within Fortran that has far less storage overheads than allocatable or pointer arrays. This is possible because often it is known that the array will be contiguous and its size is either known or can be deduced from some other data which is available. I will call these movable arrays, based on a proposal I originally developed with John Reid, pasted almost literally in the Proposal section. Movable arrays are a mixture between assumed-size arrays and array pointers. They behave like assumed-size arrays in almost all contexts, but can be inside derived types and can change their address dynamically like pointers. Some will note similarities with Cray pointers. IMO, movable arrays should also provide the basis for interoperability with C arrays, rather then the current system where untyped pointers are used for everything having to do with C. This is in agreement with several comments to the FCD draft (but it was then too late for such a big change). ______________________________________ Motivation ______________________________________ Consider a data-structure representing a graph, where a vertex needs a list of neighbours. Presently, one can do: TYPE vertex INTEGER :: start_of_neighbors ! The list of neighbors END TYPE vertex where start_of_neighbours is an index into some array where the neighbour lists for all vertices are concatenated. This usage of integers instead of (or as) pointers is common on FORTRAN, and efficient, but has a very big defficiency. An integer index *assumes* that there is only one array where neighbours are stored. But in dynamic applications there may be several buffers used for storing vertices, and a pointer is much better then an integer: TYPE vertex INTEGER, DIMENSION(:), POINTER :: neighbors ! The list of neighbors END TYPE vertex Now, one has freedom of where the neighbors are stored and how they are allocated, and can simply use pointer assignment on neighbors to change this. For efficiency reasons, one would almost allocate several large neighbour arrays when initializing the graph, and then chop them into small pieces, one for each vertex, using pointer assignment. An example code segment might be: TYPE(vertex), DIMENSION(N) :: V ! Vertex set INTEGER, DIMENSION(N*d) :: N ! Neighbour set, d is maximal degree INTEGER :: vertex, counter, degree counter=1 DO vertex=1, N degree=... ! Assign value V(vertex)%neighbors=>N(counter:counter+degree) ! Pointer assignment counter=counter+degree END DO This approach is nice to use and allows one to have freedom in how the neighbour lists are allocated, for example. The above code has a big efficiency problem though, which is very important in practice. The array pointer neighbours in the type vertex usually takes a lot of memory, since the array pointer concept of Fortran 90 requires that adress, stride and bound information be stored. The compilers I have used usually pad this to some large value such as 16 or 32 bytes. This storage requirement is excessive and in many cases not necessary. For example, stride information is often not needed since the user knows that he will only use memory contigous blocks, and in some cases, such as skip linked search lists, the size (bound) information is also not needed. TYPE vertex INTEGER :: degree=0 INTEGER, DIMENSION(*), POINTER :: neighbours END TYPE vertex then the assignment of the neighbour list in the above code would be substituted with: V(vertex)%degree=degree V(vertex)%neighbors=>N(counter) ! Or N(counter:) ! Or N(counter:counter+degree) The syntax bellows does not use POINTER but MOVABLE to avoid conflicts with the existing words in the standard. ______________________________________ Proposal ______________________________________ 2. THE MOVABLE ARRAY Our starting point is the assumed-size array. We call our new sort of array 'movable' for the moment. We are not committed to this adjective, but we want to avoid 'pointer' since this is already used extensively in the standard with another meaning. A movable array can be moved from one target to another with the pointer assignment statement: real, movable :: a(*) real, target :: t(100) a => t and allocated and deallocated: allocate (a(30)) deallocate (a) We do not want any array copying to take place with such a pointer assignment, so the right-hand side must be an array that is movable, assumed-size, allocatable, or explicit-size. It must have the same type and must have the target attribute. An element of such an array is allowed, too (as for the actual argument corresponding to an assumed-size dummy array). The ranks are not required to agree (again, as for the actual argument corresponding to an assumed-size dummy array), and in such cases array element sequence association between the target and the movable array is implied. Deallocation will be allowed only if the target was allocated as a movable array. Movable arrays can also be null, tested for null, and tested for being associated with the same target: real, movable :: a(*), b(*)=>null(), c(*) ... a => null() nullify(c) if (associated(b)) then ... if (associated(a,b)) then The two-argument version tests only that a and b share the same first element. We expect the implementation of a movable array to be like that for an assumed-size array. Mostly, its descriptor will consist of an address only. However, a debugging compiler might also hold its size and a flag to indicate whether the target was allocated as a movable array. An explicit interface will be required like for allocatable array arguments. If the dummy argument is a movable array, the actual argument must be a movable array. If the actual argument is a movable array, the dummy argument must be a movable, explicit-shape or assumed-size array. We will also permit movable arrays as components of types: type my_data real, movable :: a(*) => null() end type my_data and we permit multi-rank movable arrays: subroutine mine(lda) integer :: lda real, movable :: a(lda,*) type my_data integer :: lda real, movable :: a(lda,*) end type my_data The bounds must be defined whenever a movable array is referenced and must not be altered while a movable array is associated with a target. If a multi-rank movable array is a component of a type, the bounds of its leading extents must be integer expressions depending only on other components of the type. For the sake of optimization, we propose that the rules that apply to assumed-size arrays with respect to aliasing should apply also to movable arrays. While a movable array is associated with a target, action that affects value of the target must be taken through the movable array. 3. INTEROPERABILITY WITH C A movable array is interoperable with a C pointer provided their types correspond. It may be a component of a type that corresponds to a C struct type with a C pointer of a corresponding type in the corresponding position. Entities of such types may be passed in a procedure call under the existing rules. If a movable array appears directly in an argument list, the VALUE attribute refers to the descriptor (usually just an integer holding the address). If the VALUE attribute is present, the corresponding actual argument must be a C pointer whose referenced type is interoperable with the type of the movable array. If absent, the corresponding actual argument must be a pointer to a pointer whose referenced type is interoperable with the type of the movable array. For example, void AllocateIntegerArray(int ** a, const int n) { *a=(int *) malloc(n*sizeof(int)); } is interoperable with: interface subroutine AllocateIntegerArray(a, n), bind(C) use iso_c_binding integer(c_int), dimension(*), movable :: a integer(c_int), intent(in), value :: n end subroutine AllocateIntegerArray end interface Note that a movable array provides a way to dereference some C pointers.