To: J3                                                       10-122
Subject: Reply to comments on interop TR ballot
From: Craig Rasmussen
Date: 2010 February 04
References: N1766, N1763, N1761, 06-128r1, WG5 worklist item j3-041


A letter ballot N1763 was taken regarding the technical content of N1761,
"Further Interoperability of Fortran with C" (Interoperability of optional,
assumed-shape, allocatable, pointer, assumed-type, and assumed-rank dummy
arguments).  Several people replied to this ballot with comments.  This
paper is a reply to some of these comments.

The format of the reply below is:

n) [Commentor(s)] Comment, possibly paraphrased

---


Reply text:

1) [Van] A C descriptor has a flag that indicates whether a pointer is
   or is not associated.  It's not always possible to tell.  It is
   always possible to tell whether a pointer is or is not
   disassociated.

---

I believe the issue here is that "not associated" covers two cases - the
pointer association status is disassociated or undefined.  It is
fairly easy to keep track of the actions that cause a pointer to
become disassociated, but not for an association status of undefined.

Having a binary flag that indicated disassociated would just migrate
the problem to the "not disassociated" state that would include
associated or undefined.  In addition, knowing that a pointer is
disassociated is not really interesting to the user. Rather the user
probably wants to know if the pointer is associated.

Before we added this flag, we just used a base address of not NULL as
an indication that the pointer was associated.

One option is to just delete this flag and go back to the old
convention.

Edits: TBD


2) [Van] The "sm" component of a CFI_dim_t struct is specified to be
   measured in bytes.  This should be in processor-dependent units.

---

The "sm" (stride multiplier) is intended to have the same units as the
the units of the elem_len component of the C descriptor, and hence the
units of the C sizeof operator. The C standard (6.5.3.4) specifies
this unit to be "in bytes". Changing the units of the stride
multiplier to be potentially different seems like an unnecessary
complication.


3) [Aleks, Nick] Use a flexible array member at the end of the cdesc
   struct to hold the bounds triples, rather than a fixed-size array.

---

The use of a fixed dim length allows usage as shown below:

    CFI_cdesc_t cdesc;
    /* set cdesc members */
    cdesc.fdesc = NULL;
    ...
    /* update the Fortran descriptor */
    CFI_update_fdesc (&CFI_cdesc_t);

We had discussed this option in subgroup.  The alternative would be to
provide alloc/free routines for the C descriptor.  This invites memory
leaks and makes the use of the feature unnecessarily complicated.  The
advantage of the flexible array member is a saving of memory space for
most cases.  The chosen alternative is simpler and less error prone at
the cost of a few extra bytes per descriptor.


4) [Mike] The file ISO_Fortran_binding.h is not included in the
   standard.

---

One of the ground rules for the TR is that no struct can be explicitly
specified in the standard. Instead, only descriptions of the struct
members are allowed.  Including a copy of ISO_Fortran_binding.h would
expose a suggested implementation of the cdesc struct, in violation of
this rule. Also, generally the standard does not mandate specific
implementations of things like this. Some vendors might, for example,
want to add extra, hidden members to the struct for various purposes.


5) [Mike] The general approach of specifying a data structure seems
   awkward compared to the approach of standardizing the particular
   set of APIs that the programmers will want.

---

The initial version of the TR was based on a set of functions / macros
that served as such an API.  This quickly grew into an MPI-like
monstrosity that would have been hard to use a best.  We switched to
the descriptor approach as cleaner and simpler to use.  Also,
minimizing the number of needed functions lowers the implementation and
maintenance costs for vendors.


6) [Mike] There is no "version" query for the structure or API.

---

Vendors are allowed to add extra members to the cdesc struct if they
want.  A version number might be a candidate. Some vendors already
encode version information in modules to avoid
incompatibilities. These are not visible to the user.  Similarly, it
seems reasonable to keep a similar version for the descriptor
invisible.


7) [Mike] There is no pointer in the Fortran descriptor pointing to a
   corresponding C descriptor, resulting in an asymmetry.

---

The Fortran and C descriptors are not intended to be symmetric. For
many implementations the Fortran descriptor will be identical to the
current vendor's dope vector.  Requiring an extra field that serves no
purpose creates needless work that would delay and discourage
implementation.


8) [Nick] A "pointer of type void" is meaningless - C has no data
   type "void". The terminology should be fixed, everywhere the
   wording is used.

---

In 6.2.5 Types, paragraph 19 of the C standard, "The void type..." is
described. So I don't feel too bad about this usage.  Elsewhere in
the C standard, the phrase "pointer to void" is used and this usage
is preferred in a subsequent conversation with Nick.


9) [Nick] "The C descriptor is a struct of type CFI_desc_t" is
   ambiguous.

---

The intention is that the struct be defined as

typedef { .......  } CFI_cdesc_t;

and declarations of descriptors of the form

CFI_cdesc_t  arg;

Improved wording is: "CFI_desc_t is a named structure type defined
by a typedef.  It can be used in variable declarations contexts
such as "CFI_cdesc_t arg;""


10) [Nick] The specification of the "attribute" field is broken. Inter
    alia, it mixes type properties with attributes, does not allow
    assumed-shape arrays to be allocatable or pointers, and makes no
    reference to assumed-rank.  I am not sure what it is proposing, or
    why. If it is needed, it should specify the attributes as bits in
    a mask.

---

Regarding the first sentence, I certainly hope we do not provide for
assumed-shape arrays that are allocatable or pointers. No such things
exist in Fortran, and we certainly don't want to create such as part of
the TR.

The final sentence does touch on a topic we had discussed
before. Should some of the "flag" type data in the cdesc be packed
into a bit mask, rather than being separate members of the struct?
The advantage of the bit mask version is that there would be fewer
members (tiny space savings) and that the unused bits in the mask
could be used for later expansion (the entity is a coarray, for
example) without affecting the overall memory layout of the struct
(potentially more valuable benefit).  The drawback is that users would
have to extract the bits from the mask to get the desired
information. We could supply constant masks that could be anded with
the mask word in the cdesc to extract the values. For programmers like
me this seems perfectly natural. Another, worse, alternative is to
pile on yet more functions / macros to extract the various bits.  The
consensus during design of the struct is that the simple, separate
members were easier to use and preferred.


11) [Nick] The description of the "state" field is baffling. Why
    should assumed-shape but not assumed-rank be included together
    with associated pointers and allocatable variables?

---

Assumed-shape variables, associated pointers, and allocatable
variables are three non-overlapping categories of
variables. Assumed-rank (and assumed-type) are concepts that apply
only in the interface specified in the Fortran domain.  Whether the
corresponding Fortran interface for the function declared an argument
to be assumed-rank or assumed-type is invisible to the C function. The
incoming Fortran descriptor, and corresponding C descriptor will have
the type and rank specified, based on the actual argument in the call.


12) [Nick] The type "intptr_t" is unsuitable for array
    indexing... "ptrdiff_t is a far better type for the purpose.

---

Members of the C committee (WG14) were polled and it was suggested that
size_t be used for variables that are count related.


13) [Nick] The specification must say which kind of C constants are
    defined in the file ISO_Fortran_binding.h, as C has several, with
    different properties. That would need careful design.  Most of
    them should be preprocessor constants suitable for use in #if
    directives, but perhaps not all.

---

It is expected that these constants will be defined as macros in
ISO_Fortran_binding.h, for example,

#define CFI_xxx  value

A member of the C committee WG14 has suggested the phrase,
"CFI_xxx is a macro defined in the file ISO_Fortran_binding.h
that expands to an integer constant expression."


14) [Nick] The layout of C structures is very dependent on the
    compiler options. In the existing Fortran standard, the processor
    can (in theory, at least) remap structures. Because N1761 proposes
    structures to use as the actual interface descriptors, that is not
    feasible.  The matter needs consideration.

---

The Fortran descriptors that are passed into a C function are
internally created by the compiler which has control over the layout.
The internal structure of the Fortran descriptor in the C domain is
visible only to the CFI_xxx functions supplied by the vendor. I would
assume these are pre-compiled as part of a library and the vendor has
complete control over the options used for compilation.  The layout of
the C descriptor as seen by the library routines would have a
particular form.  So you are saying that the user could compile his
code with other options that would lead to an incompatible layout?  If
that's the case, how do users handle the raft of system header files
with struct definitions?  I've not seen any instructions in man pages
saying the user had to specify certain compiler options if system
header files are used.  Could you explain, with an example, what the
real problem is here?


15) [Nick] The description of stride "equal to the difference between
    the subscript values of consecutive elements of an array along a
    specified dimension" makes no sense when applied to a dummy
    argument - it is always one. Fortran has no concept of the strides
    of the actual arguments being visible to the called procedure."

---

True (modulo playing games with C_LOC()) for the case of Fortran
calling Fortran.  However, C needs access to non-contiguous array
sections without resorting to temporary copies.  The stride
referred is that obtained from the descriptor that the C programmer
sees for the corresponding actual argument, not the stride of the
dummy argument visible to a Fortran programmer.


16) [Nick] It is unclear is one has the ability to call the MPI
    transfer functions with arrays of interoperable derived types as
    choice arguments.

---

This is possible by defining an MPI datatype and using TYPE(*) in the MPI
interface.


17) [Nick] The current proposal has unnecessary restrictions and
    artificial distinctions between Fortran features.  For example,
    the current proposal allows for the passing of assumed-shape
    arrays but not assumed-length CHARACTER.

---

It would be easy to include assumed-length characters in the design of
the CFI_desc_t structure type.  However this would require vendors to
modify their calling convention for character types, which they have
been reluctant to do.  This is a potentially good idea that could be
included in a future standard within the current design.


18) [Nick] A design should be considered that allows potential
    extensibility to interfacing with the C variable argument list
    mechanism.

---

This was discussed in subgroup and we don't have a good mechanism to
interoperate with the C varargs mechanism.  Access to a C variable
argument list is via macros and apparently cannot be implemented with
functions.

19) [Nick]) At least potentially permit the companion processor to be
    a debugging tool (preferably passing object bounds across the
    interface for all arguments).  Perhaps the simplest and cleanest
    way of resolving many interoperability issues is to extend the
    BIND attribute slightly, to allow the programmer to select between
    the current interface and a descriptor-based one, along the lines
    of:

proc-language-binding-spec is
    BIND(C[,METHOD=binding_method][NAME=scalar-char-initialization-expr])

binding_method is DIRECT
               or DESCRIPTOR

The default is DIRECT, and is what we have at present.

---

This is an interesting idea.  It is compatible with the current design
and could be proposed as an extension for a later Fortran standard.


18) [Nick] The design does not support ALLOCATABLE and POINTER arrays
    cleanly; in particular, the C descriptor does not contain enough
    information to update the pointer value correctly.  Changing the
    base address in the descriptor alone will NOT work.

---

Fortran descriptors are only changed by calls to CFI_update_fdesc().
Vendors will have the opportunity to modify the Fortran descriptor
based on information in both the C descriptor and the Fortran
descriptor.  Since vendors have full control over the Fortran
descriptor, any additional information that may be required to do the
update will be present in the Fortran descriptor.  If the address of
the Fortran descriptor is NULL, then fields in the vendor's Fortran
descriptor can reflect the fact that it originated from outside
Fortran.  However, it may not be possible for all fields in a
Fortran descriptor to be set.  It will be possible to set essential
fields.


19) [Nick] The types need to include a non-interoperable type, for use
    with assumed-type arguments.  The existing standard already allows
    them to be passed between Fortran and C.

---

CFI_type_cptr and CFI_type_cfunptr will be added to the list.


20) [Nick] The specification of ISO_Fortran_binding.h should be
    specified to be unitary as the standard C headers are.

---

A member of the C committee suggested that the phase, "Multiple inclusion
of ISO_Fortran_binding.h within a translation unit shall have no effect,
other than line numbers, different from just the first inclusion."


21) [Nick] The level of namespace pollution must also be specified,
    for future enhancements.

---

The following sentence will be added: "No names other than those
specified shall be placed in the global namespace by inclusion of the
file ISO_Fortran_binding.h"


22) [Nick] The descriptor functions are impure and return an error
    code through their function result.  Because they are C
    interfaces, that is plausible, but it is undesirable to use a
    specification that conflicts with Fortran's conventions.  In
    particular, it would obstruct a vendor or future version of the
    standard from defining them as interoperable procedures.  That
    option should be left open.

---

Functions returning error codes are preferable in C.  There is no
restriction on allowing vendors to make these functions interoperable.


23) [Nick] The descriptor functions use the supplied descriptors both
    as the source of data and where they store the results, but they
    do not specify which fields must be set on entry and which are set
    on exit.

---

The text has been modified to better describe these functions.


24) [Nick] CFI_update_fdesc is specified to use malloc, but that is an
    unreasonable restriction on an implementation; the requirement
    should be removed.

---

A function CFI_free_fdesc will be provided.


25) [Nick] CFI_allocate says "The supplied bounds override any current
    dimension information in the descriptors. The stride values are
    ignored and assumed to be one. Both the Fortran and C descriptors
    are updated by this function."  That makes no sense, as it would
    leave the descriptors in an invalid state.  Also, the intent of
    CFI_bounds_to_cdesc is unclear, especially as it does not say that
    creating an invalid descriptor is forbidden.

---

Stating that creating an invalid descriptor is forbidden is implied by
"Errors might occur because values supplied in an argument are invalid
for that function" in discussion of error returns.  Additional language
will be added to make this stronger.


26) [Nick] "The base address in the C descriptor for a data pointer
    may be modified by assignment and that change later affected in
    the corresponding Fortran descriptor by the CFI_update_fdesc
    function" means that allocatable and pointer arrays can be changed
    other than by calls to CFI_allocate and CFI_deallocate (which is
    stated to be forbidden elsewhere).  Something is wrong, but I
    don't know what.

---

As stated, data pointers may be modified by calls to CFI_update_fdesc.
However, modification of the base address for descriptors with other
attributes (allocatable or assumed-shape) is not allowed
and will result the return of an error code.


27) [Nick] Wording should be added to forbid the bounds array to have
    a higher rank than the descriptor.

---

The text will be modified to forbid this.


28) [Nick] The examples use "integer(8)" to indicate an 8-byte integer
    set up by a compiler option "-i8".  That is completely
    processor-dependent, and should not be included in a Technical
    Report.  Also, the examples use MPI names, but are wildly invalid.
    They should use MPI correctly, or not refer to it.

---

The examples will be changed.


29) [Jim] Although Fortran descriptors are used when passing
    assumed-shape arrays, pointer arrays, and allocatable arrays by
    many vendors, they however are not universally used by all
    vendors.

---

We know of no vendor that does not use descriptors to pass array
meta-data.  It is true that if a vendor does not currently
use descriptors, this design would require them to for interfaces
with the BIND attribute.