To: J3 10-122 Subject: Reply to comments on interop TR ballot From: Craig Rasmussen Date: 2010 February 04 References: N1766, N1763, N1761, 06-128r1, WG5 worklist item j3-041 A letter ballot N1763 was taken regarding the technical content of N1761, "Further Interoperability of Fortran with C" (Interoperability of optional, assumed-shape, allocatable, pointer, assumed-type, and assumed-rank dummy arguments). Several people replied to this ballot with comments. This paper is a reply to some of these comments. The format of the reply below is: n) [Commentor(s)] Comment, possibly paraphrased --- Reply text: 1) [Van] A C descriptor has a flag that indicates whether a pointer is or is not associated. It's not always possible to tell. It is always possible to tell whether a pointer is or is not disassociated. --- I believe the issue here is that "not associated" covers two cases - the pointer association status is disassociated or undefined. It is fairly easy to keep track of the actions that cause a pointer to become disassociated, but not for an association status of undefined. Having a binary flag that indicated disassociated would just migrate the problem to the "not disassociated" state that would include associated or undefined. In addition, knowing that a pointer is disassociated is not really interesting to the user. Rather the user probably wants to know if the pointer is associated. Before we added this flag, we just used a base address of not NULL as an indication that the pointer was associated. One option is to just delete this flag and go back to the old convention. Edits: TBD 2) [Van] The "sm" component of a CFI_dim_t struct is specified to be measured in bytes. This should be in processor-dependent units. --- The "sm" (stride multiplier) is intended to have the same units as the the units of the elem_len component of the C descriptor, and hence the units of the C sizeof operator. The C standard (6.5.3.4) specifies this unit to be "in bytes". Changing the units of the stride multiplier to be potentially different seems like an unnecessary complication. 3) [Aleks, Nick] Use a flexible array member at the end of the cdesc struct to hold the bounds triples, rather than a fixed-size array. --- The use of a fixed dim length allows usage as shown below: CFI_cdesc_t cdesc; /* set cdesc members */ cdesc.fdesc = NULL; ... /* update the Fortran descriptor */ CFI_update_fdesc (&CFI_cdesc_t); We had discussed this option in subgroup. The alternative would be to provide alloc/free routines for the C descriptor. This invites memory leaks and makes the use of the feature unnecessarily complicated. The advantage of the flexible array member is a saving of memory space for most cases. The chosen alternative is simpler and less error prone at the cost of a few extra bytes per descriptor. 4) [Mike] The file ISO_Fortran_binding.h is not included in the standard. --- One of the ground rules for the TR is that no struct can be explicitly specified in the standard. Instead, only descriptions of the struct members are allowed. Including a copy of ISO_Fortran_binding.h would expose a suggested implementation of the cdesc struct, in violation of this rule. Also, generally the standard does not mandate specific implementations of things like this. Some vendors might, for example, want to add extra, hidden members to the struct for various purposes. 5) [Mike] The general approach of specifying a data structure seems awkward compared to the approach of standardizing the particular set of APIs that the programmers will want. --- The initial version of the TR was based on a set of functions / macros that served as such an API. This quickly grew into an MPI-like monstrosity that would have been hard to use a best. We switched to the descriptor approach as cleaner and simpler to use. Also, minimizing the number of needed functions lowers the implementation and maintenance costs for vendors. 6) [Mike] There is no "version" query for the structure or API. --- Vendors are allowed to add extra members to the cdesc struct if they want. A version number might be a candidate. Some vendors already encode version information in modules to avoid incompatibilities. These are not visible to the user. Similarly, it seems reasonable to keep a similar version for the descriptor invisible. 7) [Mike] There is no pointer in the Fortran descriptor pointing to a corresponding C descriptor, resulting in an asymmetry. --- The Fortran and C descriptors are not intended to be symmetric. For many implementations the Fortran descriptor will be identical to the current vendor's dope vector. Requiring an extra field that serves no purpose creates needless work that would delay and discourage implementation. 8) [Nick] A "pointer of type void" is meaningless - C has no data type "void". The terminology should be fixed, everywhere the wording is used. --- In 6.2.5 Types, paragraph 19 of the C standard, "The void type..." is described. So I don't feel too bad about this usage. Elsewhere in the C standard, the phrase "pointer to void" is used and this usage is preferred in a subsequent conversation with Nick. 9) [Nick] "The C descriptor is a struct of type CFI_desc_t" is ambiguous. --- The intention is that the struct be defined as typedef { ....... } CFI_cdesc_t; and declarations of descriptors of the form CFI_cdesc_t arg; Improved wording is: "CFI_desc_t is a named structure type defined by a typedef. It can be used in variable declarations contexts such as "CFI_cdesc_t arg;"" 10) [Nick] The specification of the "attribute" field is broken. Inter alia, it mixes type properties with attributes, does not allow assumed-shape arrays to be allocatable or pointers, and makes no reference to assumed-rank. I am not sure what it is proposing, or why. If it is needed, it should specify the attributes as bits in a mask. --- Regarding the first sentence, I certainly hope we do not provide for assumed-shape arrays that are allocatable or pointers. No such things exist in Fortran, and we certainly don't want to create such as part of the TR. The final sentence does touch on a topic we had discussed before. Should some of the "flag" type data in the cdesc be packed into a bit mask, rather than being separate members of the struct? The advantage of the bit mask version is that there would be fewer members (tiny space savings) and that the unused bits in the mask could be used for later expansion (the entity is a coarray, for example) without affecting the overall memory layout of the struct (potentially more valuable benefit). The drawback is that users would have to extract the bits from the mask to get the desired information. We could supply constant masks that could be anded with the mask word in the cdesc to extract the values. For programmers like me this seems perfectly natural. Another, worse, alternative is to pile on yet more functions / macros to extract the various bits. The consensus during design of the struct is that the simple, separate members were easier to use and preferred. 11) [Nick] The description of the "state" field is baffling. Why should assumed-shape but not assumed-rank be included together with associated pointers and allocatable variables? --- Assumed-shape variables, associated pointers, and allocatable variables are three non-overlapping categories of variables. Assumed-rank (and assumed-type) are concepts that apply only in the interface specified in the Fortran domain. Whether the corresponding Fortran interface for the function declared an argument to be assumed-rank or assumed-type is invisible to the C function. The incoming Fortran descriptor, and corresponding C descriptor will have the type and rank specified, based on the actual argument in the call. 12) [Nick] The type "intptr_t" is unsuitable for array indexing... "ptrdiff_t is a far better type for the purpose. --- Members of the C committee (WG14) were polled and it was suggested that size_t be used for variables that are count related. 13) [Nick] The specification must say which kind of C constants are defined in the file ISO_Fortran_binding.h, as C has several, with different properties. That would need careful design. Most of them should be preprocessor constants suitable for use in #if directives, but perhaps not all. --- It is expected that these constants will be defined as macros in ISO_Fortran_binding.h, for example, #define CFI_xxx value A member of the C committee WG14 has suggested the phrase, "CFI_xxx is a macro defined in the file ISO_Fortran_binding.h that expands to an integer constant expression." 14) [Nick] The layout of C structures is very dependent on the compiler options. In the existing Fortran standard, the processor can (in theory, at least) remap structures. Because N1761 proposes structures to use as the actual interface descriptors, that is not feasible. The matter needs consideration. --- The Fortran descriptors that are passed into a C function are internally created by the compiler which has control over the layout. The internal structure of the Fortran descriptor in the C domain is visible only to the CFI_xxx functions supplied by the vendor. I would assume these are pre-compiled as part of a library and the vendor has complete control over the options used for compilation. The layout of the C descriptor as seen by the library routines would have a particular form. So you are saying that the user could compile his code with other options that would lead to an incompatible layout? If that's the case, how do users handle the raft of system header files with struct definitions? I've not seen any instructions in man pages saying the user had to specify certain compiler options if system header files are used. Could you explain, with an example, what the real problem is here? 15) [Nick] The description of stride "equal to the difference between the subscript values of consecutive elements of an array along a specified dimension" makes no sense when applied to a dummy argument - it is always one. Fortran has no concept of the strides of the actual arguments being visible to the called procedure." --- True (modulo playing games with C_LOC()) for the case of Fortran calling Fortran. However, C needs access to non-contiguous array sections without resorting to temporary copies. The stride referred is that obtained from the descriptor that the C programmer sees for the corresponding actual argument, not the stride of the dummy argument visible to a Fortran programmer. 16) [Nick] It is unclear is one has the ability to call the MPI transfer functions with arrays of interoperable derived types as choice arguments. --- This is possible by defining an MPI datatype and using TYPE(*) in the MPI interface. 17) [Nick] The current proposal has unnecessary restrictions and artificial distinctions between Fortran features. For example, the current proposal allows for the passing of assumed-shape arrays but not assumed-length CHARACTER. --- It would be easy to include assumed-length characters in the design of the CFI_desc_t structure type. However this would require vendors to modify their calling convention for character types, which they have been reluctant to do. This is a potentially good idea that could be included in a future standard within the current design. 18) [Nick] A design should be considered that allows potential extensibility to interfacing with the C variable argument list mechanism. --- This was discussed in subgroup and we don't have a good mechanism to interoperate with the C varargs mechanism. Access to a C variable argument list is via macros and apparently cannot be implemented with functions. 19) [Nick]) At least potentially permit the companion processor to be a debugging tool (preferably passing object bounds across the interface for all arguments). Perhaps the simplest and cleanest way of resolving many interoperability issues is to extend the BIND attribute slightly, to allow the programmer to select between the current interface and a descriptor-based one, along the lines of: proc-language-binding-spec is BIND(C[,METHOD=binding_method][NAME=scalar-char-initialization-expr]) binding_method is DIRECT or DESCRIPTOR The default is DIRECT, and is what we have at present. --- This is an interesting idea. It is compatible with the current design and could be proposed as an extension for a later Fortran standard. 18) [Nick] The design does not support ALLOCATABLE and POINTER arrays cleanly; in particular, the C descriptor does not contain enough information to update the pointer value correctly. Changing the base address in the descriptor alone will NOT work. --- Fortran descriptors are only changed by calls to CFI_update_fdesc(). Vendors will have the opportunity to modify the Fortran descriptor based on information in both the C descriptor and the Fortran descriptor. Since vendors have full control over the Fortran descriptor, any additional information that may be required to do the update will be present in the Fortran descriptor. If the address of the Fortran descriptor is NULL, then fields in the vendor's Fortran descriptor can reflect the fact that it originated from outside Fortran. However, it may not be possible for all fields in a Fortran descriptor to be set. It will be possible to set essential fields. 19) [Nick] The types need to include a non-interoperable type, for use with assumed-type arguments. The existing standard already allows them to be passed between Fortran and C. --- CFI_type_cptr and CFI_type_cfunptr will be added to the list. 20) [Nick] The specification of ISO_Fortran_binding.h should be specified to be unitary as the standard C headers are. --- A member of the C committee suggested that the phase, "Multiple inclusion of ISO_Fortran_binding.h within a translation unit shall have no effect, other than line numbers, different from just the first inclusion." 21) [Nick] The level of namespace pollution must also be specified, for future enhancements. --- The following sentence will be added: "No names other than those specified shall be placed in the global namespace by inclusion of the file ISO_Fortran_binding.h" 22) [Nick] The descriptor functions are impure and return an error code through their function result. Because they are C interfaces, that is plausible, but it is undesirable to use a specification that conflicts with Fortran's conventions. In particular, it would obstruct a vendor or future version of the standard from defining them as interoperable procedures. That option should be left open. --- Functions returning error codes are preferable in C. There is no restriction on allowing vendors to make these functions interoperable. 23) [Nick] The descriptor functions use the supplied descriptors both as the source of data and where they store the results, but they do not specify which fields must be set on entry and which are set on exit. --- The text has been modified to better describe these functions. 24) [Nick] CFI_update_fdesc is specified to use malloc, but that is an unreasonable restriction on an implementation; the requirement should be removed. --- A function CFI_free_fdesc will be provided. 25) [Nick] CFI_allocate says "The supplied bounds override any current dimension information in the descriptors. The stride values are ignored and assumed to be one. Both the Fortran and C descriptors are updated by this function." That makes no sense, as it would leave the descriptors in an invalid state. Also, the intent of CFI_bounds_to_cdesc is unclear, especially as it does not say that creating an invalid descriptor is forbidden. --- Stating that creating an invalid descriptor is forbidden is implied by "Errors might occur because values supplied in an argument are invalid for that function" in discussion of error returns. Additional language will be added to make this stronger. 26) [Nick] "The base address in the C descriptor for a data pointer may be modified by assignment and that change later affected in the corresponding Fortran descriptor by the CFI_update_fdesc function" means that allocatable and pointer arrays can be changed other than by calls to CFI_allocate and CFI_deallocate (which is stated to be forbidden elsewhere). Something is wrong, but I don't know what. --- As stated, data pointers may be modified by calls to CFI_update_fdesc. However, modification of the base address for descriptors with other attributes (allocatable or assumed-shape) is not allowed and will result the return of an error code. 27) [Nick] Wording should be added to forbid the bounds array to have a higher rank than the descriptor. --- The text will be modified to forbid this. 28) [Nick] The examples use "integer(8)" to indicate an 8-byte integer set up by a compiler option "-i8". That is completely processor-dependent, and should not be included in a Technical Report. Also, the examples use MPI names, but are wildly invalid. They should use MPI correctly, or not refer to it. --- The examples will be changed. 29) [Jim] Although Fortran descriptors are used when passing assumed-shape arrays, pointer arrays, and allocatable arrays by many vendors, they however are not universally used by all vendors. --- We know of no vendor that does not use descriptors to pass array meta-data. It is true that if a vendor does not currently use descriptors, this design would require them to for interfaces with the BIND attribute.