10-235 To: J3 From: Nick Maclaren Subject: Interop TR: Restrictions for correctness Date: 2010 October 09 Reference: 10-165r2, 10-206, 10-207, 10-222, 10-224r1 This paper is about allocation, association, lifetime, descriptor update and related matters. My concern is to ensure the following: 1) That the specification is reasonably comprehensible to a reader who is not both an expert on the minutiae of the Fortran standard and an expert on Fortran implementation issues. Obviously, such a reader can be expected to look up references, if given a starting point. 2) That, to a reasonable extent, all conforming programs are implementable to the extent that they execute correctly. Obviously, 'reasonable' and 'correctly' should be interpreted relative to the quality of the main Fortran standard. I am afraid that neither are currently true, and I am doubtful that the problems are fixable without major revision. This paper proposes such a revision, though it is very unlikely that it can be completed this week, especially as I am not in Las Vegas. The same applies to converting the points it makes to the existing design. There is a further issue, which is that many of the comments on the original draft (as collated in N1766, with some replies in N1818) were on this area, and have not been addressed, implicitly or explicitly. I apologise for the lateness, length and incoherence of this, but the issues are evil ones and it needed to refer to the resolutions of the requirements in N1820. 1. SUMMARY OF ISSUES -------------------- The following are the main categories of problem that I have noticed: 1.1 Descriptor use is currently entirely in terms of update, which is semantically very different from the permitted actions on Fortran metadata. For example, the shape of an assumed-shape array can be changed only by creating a new array (e.g. a section). 1.2 There are currently almost no restrictions and constraints on how descriptors may be used, nor is there enough description to say that anything not stated explicitly is forbidden. 10-165r2.pdf [11:6-7] are the only two I can find. 1.3 The readers of this TR are likely to be primarily C programmers, and will assume that any action in the C code is permitted if it (a) uses only features that the TR describes (and does not forbid), and (b) is legal in C. 1.4 N1826 6.7.1.3 para. 3 and 6.7.3.2 para. 1 require all associated allocatable variables to be linked in some way, but the mechanisms provided by this TR (including 10-206) do not make that implementable. 1.5 The type matching and lifetime rules are very different between Fortran and C, and there are serious ambiguities about which ones should take precedence (given that we are talking about C code). While it is clearly possible to start addressing on these using the current design, I spent a complete day attempting that and gave up. The problems were such that a slightly different design would make half of them simply go away, and make the other half much easier to specify. That is to make the C functions provided correspond fairly closely with the basic Fortran primitives, and to separate the three different categories of object when appropriate. As part of that, I include a first draft at a set of semantic restrictions that are not associated with particular functions. Whatever the conclusion, both those and the semantic constraints implied by the alternative design need including in the TR. I do not guarantee that I have thought of everything, and I am sure that the wording is improper. 2. EXAMPLES OF PROBLEMS ----------------------- I apologise for the length of these, but I have attempted to make them comprehensible to people who are not necessarily C language and implementation technique experts. 2.1 Allocation and Association ------------------------------ The Fortran standard requires the allocation in Joe to be reflected in what the main program sees. The following is required to print False and True [N1826 6.7.1.3 p3 and 6.7.3.2 p1]. PROGRAM Main ! Assume a suitable interface block for Fred REAL, ALLOCATABLE :: x(:) PRINT *, ALLOCATED(x) CALL Fred(x) PRINT *, ALLOCATED(x) END PROGRAM Main SUBROUTINE Fred (y) ! Assume a suitable interface block for Joe REAL, ALLOCATABLE :: y(:) CALL Joe(y) END SUBROUTINE Fred SUBROUTINE Joe (z) REAL, ALLOCATABLE :: z(:) ALLOCATE(z(5)) END SUBROUTINE Joe Now, what if Fred were a BIND(C) function, especially one written in C? What constraints do we need to add to enable this? Or should it be forbidden? Or what? 2.1 Update and Creation ----------------------- Currently the TR does not say whether a C program can or cannot initialise a C descriptor directly (see designation in C99 6.7.8), copy it using memcpy(), create it using a base address that was derived from an argument descriptor, or even update an argument descriptor directly. This cannot simply be forbidden, because 5.2.7 uses it specifically as a way of meeting 9c under controlled conditions [12:31-33], and as a way of meeting 10a and 10b (in part) [12:31]; so also does 10-224r1. However, [12:34] does not restrict itself to pointer arrays. But there are a LOT of constraints that are imposed by Fortran, such as that the attributes (as in 5.2.5 p3) and rank of an existing object may not be changed, the descriptors for INTENT(IN) or assumed-shape arrays may not be changed in any way, the dimension triples are limited in form (constraint 5), and that all allocatable descriptors are linked (see example 2.1 above). It is possible to change the rank of a contiguous assumed-shape array in Fortran by passing it through explicit-shape intermediaries (i.e. using sequence association). This feature is quite important to a few algorithms, such as FFTs, and needs no facilities that are not in the current TR. Many users will do that, and the obvious way is by constructing a new assumed-shape array starting from a base pointer obtained by CFI_address. At the very least, it needs to be clear whether that is permitted. 2.3 Lifetime ------------ This is an area where allocatable and (Fortran) pointer objects and internal procedures with BIND(C) as arguments expose the semantic differences between C and Fortran. Without going into the complicated details, here are some of the issues: 1) The lifetime of an argument descriptor is different in Fortran and C, because it may be needed for copy-out following the return, but C has nothing except call by value. Note that, because the descriptor is passed by the value of the address, it is legal for C to free its memory following its last use and before the return. That is likely to confuse Fortran! I am not just being legalistic, as there are several interfaces that work just like that, and it is apparently reasonable if C calls Fortran calls C. 2) The lifetime of an object created by CFI_allocatable is complicated. Consider a C function that creates a new descriptor, allocates space using CFI_allocate, calls Fortran and then returns. Is he expected to call CFI_deallocate before that C function call returns? As we know, but C programmers will not, the answer is that he is expected to unless that variable is a pointer variable and it was passed to the Fortran with the TARGET attribute, when he must NOT do that if the Fortran he called has retained any saved pointers to it. 3) Consider an object X that is the target of the global pointer object Y and was created by ALLOCATE. It is then passed (without the target attribute) to a C function Fred, which uses it internally, finishes with it and (as a last action) nullifies Y. That is legal in C but, under what conditions, is it forbidden by the TR? 2.4 Type and Storage Compatibility ---------------------------------- This area is even murkier, and I would much rather exclude the potential problems than describe them, but here are some examples: 1) The TR brings the matter of alignment to the fore, and Fortran and C have different rules; I have seen programs fail because of this. For example, is it permitted for Fortran to create a descriptor that uses what C views as invalid alignment, if C does nothing but manipulate the descriptor and then pass it onto Fortran? And conversely, of course. 2) Fortran and C have different rules about when storage defined as one type may be used for another type (note that this is NOT about using one type as another, but about using the storage). C has the concept of types that are not equivalent but where storage holding one may be used as the other. In N1826, this is not an issue, because arguments have no type, but the TR does. Is that forbidden by this TR or not? 3) Related to this, what are the rules about C setting the type field? C has at least three sets of rules for type equivalence, and Fortran has a different one. Which one applies? 4) Is it permitted for C to create a descriptor where some or all indexing operations would not point to part of the object and then pass it to another C function, if they were never used? Obviously, yes, until we consider example 2.1 above. ALTERNATIVE APPROACH -------------------- I believe that it is much easier to have a set of functions that are much closer to the Fortran primitives, and to separate actions that are semantically inconsistent. The wording of the constraints can then be a lot simpler, and they are much easier to get right. The other fundamental change in this proposal is that the C would not be allowed to create or update a descriptor except through the functions provided, in any way whatsoever. Reading one doesn't cause trouble. I have made quite a few wording changes to match C concepts better, and have have included changes from 10-203r1. Personally, I should prefer that unused arguments and fields were required to be set to a suitable null value, but I have not made that change. 5.2.6.2 int CFI_allocate ( CFI_cdesc_t * cdesc, const CFI_bounds_t bounds[] ); Description. CFI_allocate allocates memory for an existing object using the same mechanism as the Fortran ALLOCATE statement. On entry, the argument cdesc shall point to a C descriptor that describes an unallocated allocatable or a disassociated pointer object. If the rank in the C descriptor is zero, the argument bounds is ignored; otherwise it points to an array of length the rank to use for the bounds for the allocation; the stride members are ignored. The C descriptor is updated by this function. The function returns an error indicator. [[[ Note that I have removed the explicit constraint on the base address, because it is redundant. And, yes, 'array' arguments in C are evil. ]]] 5.2.6.2+ int CFI_create ( CFI_cdesc_t * cdesc, void * base_addr, CFI_attribute_t attribute, CFI_type_t type, size_t elem_len, CFI_rank_t rank, const CFI_bounds_t bounds[] ); Description. CFI_allocate creates a new object, allocates memory for it, and initialises the C descriptor pointed to by cdesc. On entry, the argument cdesc shall point to a C object large enough to hold a C descriptor of the specified rank; it shall not point to an existing C descriptor. If the argument base_addr is NULL, the memory is allocated as if by a call to CFI_allocate, otherwise it shall be appropriately aligned (ISO/IEC 9899:1999 3.2) for an object of the specified type, and is used to set the base address of the object. The argument attribute shall be one of CFI_attribute_assumed, CFI_attribute_allocatable or CFI_attribute_pointer. The argument type shall be one of the type names in table 5.2. The argument elem_len is ignored unless type is CFI_type_struct, in which case it is the size of the structure and shall be greater than zero. The argument rank is the rank of the object and shall be between 0 and 31 inclusive. If the argument rank is zero, the argument bounds is ignored; otherwise it points to an array of length the rank to use for the bounds for the allocation; the stride members are ignored. The function returns an error indicator. If the argument attribute is CFI_attribute_assumed, the argument base_addr shall not be NULL, and the object will be a scalar or array without the ALLOCATABLE or POINTER attributes. If the argument attribute is CFI_attribute_allocatable, the argument base_addr shall be NULL, and the object will be allocatable. If the argument attribute is CFI_attribute_pointer, the object will be a pointer. 5.2.6.2++ int CFI_setpointer ( CFI_cdesc_t * cdesc, void * base_addr, const CFI_bounds_t bounds[] ); Description. CFI_setpointer updates an existing pointer object. On entry, the C descriptor pointed to by cdesc shall describe a pointer object. If the argument base_addr is NULL, the pointer is nullified; otherwise, the new base address is set to base_addr, which shall be appropriately aligned (ISO/IEC 9899:1999 3.2) for an object of the specified type. If the rank in the C descriptor is zero, the argument bounds is ignored; otherwise it it points to an array of length the rank to use for the bounds for the allocation; the stride members are ignored. The C descriptor is updated by this function. The function returns an error indicator. 5.2.6.3 int CFI_deallocate ( CFI_cdesc_t * cdesc ); Description. CFI_deallocate deallocates memory using the same mechanism as the Fortran DEALLOCATE statement. On entry, the C descriptor pointed to by cdesc shall describe an allocated allocatable object or a pointer associated with a target that was allocated using CFI_allocate or the Fortran ALLOCATE statement. The C descriptor is updated by this function. The function returns an error indicator. [[[ Note that I have removed the explicit constraint on the base address, because it is redundant. ]]] [[[ 5.2.6.3 CFI_is_continuous is discussed elsewhere. ]]] [[[ 5.2.6.4 CFI_address would also be needed, as in 10-222. ]]] 5.2.6.5 int CFI_section ( CFI_cdesc_t * result, const CFI_cdesc_t * source, const CFI_bounds_t bounds[] ); Description. CFI_section initialises the C descriptor pointed to by result to refer to a section of an array pointed to by source; it shall not point to an existing C descriptor. On entry, the argument result shall point to a C object large enough to hold a C descriptor of the appropriate rank. The C descriptor pointed to by source shall describe an assumed-shape array, an allocated allocatable array, or an associated pointer array. The argument bounds provides the bounds to use for the allocation and the number of elements shall be greater than or equal to the rank. On exit, the C descriptor result pointed to by result will describe an assumed-shape array. The function returns an error indicator. All strides in the bounds array shall be non-zero. There shall be a reordering of the dimensions such that the absolute value of the stride of one dimension is not less than the absolute value of the stride of the previous dimension multiplied by the extent of the previous dimension. [[[ 5.2.6.6 is deleted. ]]] 5.2.7 Use of C descriptors A C descriptor shall not be initialised, updated or copied other than by calling the functions specified here. A C descriptor passed as a dummy argument shall not be updated if it has attribute INTENT(IN), or describes an assumed-shape array. Calling CFI_allocate or CFI_deallocate causes any Fortran variables and other C descriptors associated with that C descriptor to be updated [N1826 6.7.1.3]. A C descriptor that is passed as an argument BIND(C) procedure call shall describe an object that is acceptable to both Fortran and C with the type specified in its type member. If the argument has the INTENT(IN) or INTENT(INOUT) attributes, or its value or that of any element is used, all of its elements shall contain legal values of that type. 5.2.7+ Restrictions on lifetimes When a Fortran object or internal procedure is deallocated, execution of its host instance is completed, or its allocation or association status becomes undefined, all C descriptors and C pointers to any part of it become undefined, and any further use of them is undefined behaviour (ISO/IEC 9899:1999 3.4.3). A C descriptor received as a dummy argument becomes undefined on return from the procedure call. If the dummy argument does not have any of the TARGET, ASYNCHRONOUS or VOLATILE attributes, all C pointers to any part of the object it describes become undefined on return from the procedure call, and any further use of them is undefined behaviour. If a C descriptor is passed as an actual argument, its lifetime and that of the object it describes (ISO/IEC 9899:1999 6.2.4) shall not end before the return from the procedure call. A Fortran pointer variable that is associated with a C descriptor shall not be accessed beyond the end of the lifetime of the C descriptor and the object it describes.