J3/98-239
Date:        1998/11/11
To:          J3
From:        interop
Subject:     Approved syntax for interoperability
References:  J3/98-165r1, J3/98-195r1, J3/98-196r2

This paper is a reference paper that collects the previously approved syntax
for interoperability features.

Describing pre-defined C data types
-----------------------------------

An ISO_C_TYPES module shall be provided that makes accessible the following
named constants of type default integer:  C_INT, C_SHORT, C_LONG,
C_LONG_LONG,
C_SIGNED_CHAR, C_FLOAT, C_DOUBLE, C_LONG_DOUBLE, C_COMPLEX,
C_DOUBLE_COMPLEX,
C_LONG_DOUBLE_COMPLEX and C_CHAR.

C_INT, C_SHORT, C_LONG, C_LONG_LONG and C_SIGNED_CHAR shall have values that
are representation methods for integers that exist on the processor or shall
have the value -1.

Because the C standard specifies that the representations for positive
signed
integers are the same as the representations for corresponding values of
unsigned integers, and because Fortran will not provide any real support for
unsigned kinds of integers, we have decided not to provide C_UNSIGNED_INT,
C_UNSIGNED_SHORT, C_UNSIGNED_LONG, C_UNSIGNED_LONG_LONG or C_UNSIGNED_CHAR
constants in the ISO_C_TYPES module.  Instead a user can use the constants
for
the signed kinds of integers to access the unsigned kinds as well.  Note
that
this has the potentially surprising side-effect that unsigned char would be
declared as INTEGER(C_SIGNED_CHAR) in Fortran.

C_FLOAT, C_DOUBLE and C_LONG_DOUBLE shall have values that specify
approximation methods for the real type that exist on the processor or shall
have the value -1.  The values of C_COMPLEX, C_DOUBLE_COMPLEX, and
C_LONG_DOUBLE_COMPLEX shall be the same as the values of C_FLOAT, C_DOUBLE,
and
C_LONG_DOUBLE, respectively.

The value of C_CHAR shall specify a representation method for characters
that
exists on the processor or shall have the value -1.

A scalar entity or derived type component in the Fortran type column of the
following table, that has a kind type parameter that has the same value as
the
named constant made accessible from the ISO_C_TYPES modules specified in the
Type kind column, are said to interoperate with scalars or structure
components of C types that are compatible (ref.  C standard) with the C
types
in the corresponding row of the C type column.

     +-------------------------------------------------------------+
     | Fortran type | Type kind             | C type               |
     |--------------+-----------------------+----------------------|
     | INTEGER      | C_INT                 | int                  |
     |              |                       | signed int           |
     |              |-----------------------+----------------------|
     |              | C_SHORT               | short int            |
     |              |                       | signed short int     |
     |              |-----------------------+----------------------|
     |              | C_LONG                | long int             |
     |              |                       | signed long int      |
     |              |-----------------------+----------------------|
     |              | C_LONG_LONG           | long long int        |
     |              |                       | signed long long int |
     |              |-----------------------+----------------------|
     |              | C_SIGNED_CHAR         | signed char          |
     |              |                       | unsigned char        |
     |--------------+-----------------------+----------------------|
     | REAL         | C_FLOAT               | float                |
     |              |-----------------------+----------------------|
     |              | C_DOUBLE              | double               |
     |              |-----------------------+----------------------|
     |              | C_LONG_DOUBLE         | long double          |
     |--------------+-----------------------+----------------------|
     | COMPLEX      | C_COMPLEX             | complex              |
     |              |-----------------------+----------------------|
     |              | C_DOUBLE_COMPLEX      | double complex       |
     |              |-----------------------+----------------------|
     |              | C_LONG_DOUBLE_COMPLEX | long double complex  |
     |--------------+-----------------------+----------------------|
     | CHARACTER    | C_CHAR                | char                 |
     +-------------------------------------------------------------+

So, for example, a scalar object of type integer, with a kind parameter
equal
to the value of C_SHORT, interoperates with a scalar object of the C type
short
or of any C type derived (via typedef) from short.

No other entities than those mentioned in this document shall be made
accessible from the ISO_C_TYPES module.  This prevents a program that is
conforming with respect to one processor from being made non-conforming with
respect to another due to names made accessible from the ISO_C_TYPES module.

C pointer types
---------------

The ISO_C_TYPES module shall make accessible an entity with the name C_PTR.
This entity shall be a derived type or a type alias name.  A Fortran scalar
entity or derived type component of type C_PTR interoperates with C scalars
or
structure components that are of any C pointer type.

Note that this requires the representation method for all C pointer types to
be
the same for the C processor, if it is to be the target of interoperability
of
a Fortran processor.  The C standard does not impose this requirement, so
this
may limit the ability of some processors to conform to Fortran 2000.
Whether
any C processors of interest actually take advantage of this needs to be
determined.

Dereferencing of C pointers within Fortran will not be supported.

Handling of structures
----------------------

A new attribute (that's not the right term, but. . .) is introduced for
Fortran
derived type declarations.  This is the BIND(C) attribute.  For example,

    TYPE, BIND(C) :: MYFTYPE
      INTEGER(C_INT) :: I, J
      REAL(C_FLOAT) :: R
    END TYPE MYFTYPE

Such a derived type definition shall not specify the SEQUENCE statement.

A Fortran scalar object or derived type component of derived type is said to
interoperate with a C scalar object or derived type component of a struct
type,
if the derived type definition includes the BIND(C) attribute, the derived
type
and the struct type have the same number of components, and components of
the
derived type interoperate with the corresponding components of the struct
type,
which shall not be bit fields or arrays whose first bound is unspecified
(need
correct C terminology again).

A Fortran derived type that specifies the BIND(C) attribute shall satisfy
the
following.

  o It shall not be a parameterized derived type.
  o It shall not specify either the EXTENDS or the EXTENSIBLE attribute.
  o Any component that is of derived type shall be of a type that specifies
the
    BIND(C) attribute as well.
  o Any component shall not specify the POINTER nor the ALLOCATABLE
attribute.

For example, a C scalar object of type myctype interoperates with a Fortran
scalar object of type myftype.

    typedef struct {
      int m, n;
      float r;
    } myctype;

Note that C9x requires the names and component names of two struct types to
be
the same in order for the types to be considered to be the same.  This is
similar to Fortran's rule describing when sequence derived types are
considered
to be the same type.  However, because of the problem of mixed-case names in
C,
we have decided to be more forgiving.

Note that unions and bit fields are not supported.  In addition, C structs
like
the following cannot interoperate with any Fortran structure (this is a new
feature of C9x):

  struct {
    int m[];  /* Last component has an unspecified bound - like assumed-size
*/
  }

Straw vote:  Should we use BIND(C) or add an optional "(C)" to the SEQUENCE
             statement?

Result of straw vote:  BIND(C)  -  4   SEQUENCE(C)  -  4   Undecided  -  3

Handling of arrays
------------------

Because Fortran arrays are stored in column-major order, whereas C arrays
are
stored in row-major order, the nth dimension of a Fortran array corresponds
to
the (r-n+1)th dimension of the C array, where both arrays are of rank r.

A Fortran explicit-shape or assumed-size array object or structure component
interoperates with C objects or structure components of array types if the
elements interoperate, the ranks of the arrays are the same, and

  1. if the Fortran array is explicit-shape, the extent in each dimension is
     the same as the extent of the corresponding dimension of the C array
(need
     C terminology here); or
  2. if the Fortran array is assumed-size, the extent in each dimension but
the
     last must be the same as the extent of the corresponding dimension of
the
     C array, and the extent of the first dimension of the C array shall be
     unspecified.

Type aliases
------------

In order to facilitate portable use of C functions that use data types
defined
with C's typedef facility, a type aliasing statement is introduced to
Fortran.
The syntax is:

   <type-alias-stmt>   <<is>>   TYPEALIAS  ::  <type-alias-list>
   <type-alias>        <<is>>   <type-alias-name>  =>  <type-spec>

A type alias name can then appear as the <type-spec> (R502) in a
<type-declaration-stmt> (R501), a <component-def-stmt> (R425) or an
<implicit-spec> (R542).  Explicit or implicit declaration of an entity or
component using a type alias name is identical to declaration using the
<type-spec> for which it is an alias.  The keyword TYPE is used in
declarations.

For example,

     TYPEALIAS :: DOUBLECOMPLEX => COMPLEX(KIND(1.0D0)),      &
    &                         NEWTYPE => TYPE(DERIVED)
     TYPE(DOUBLECOMPLEX) :: X, Y
     TYPE(NEWTYPE) :: S

The type alias name can also be used as a structure constructor name, if it
is
an alias for a derived type.

A type alias name shall not be the same as the name of an intrinsic type.
The
type alias name declared is a local entity of class (1) (14.1.2), and can be
made accessible via use or host association.

Note the => in the type alias statement syntax precludes making the ::
optional.  Otherwise, there is an ambiguity with pointer assignment in fixed
source form.  Also, the TYPE keyword is required when the type alias name is
used in <type-spec>s, as there would be a potential for ambiguity in fixed
source form were it omitted:

     TYPEALIAS :: REWIND => LOGICAL
     REWIND I

Suggestions that allow the "::" to be optional and suggested alternatives to
using TYPE in declarations will be gladly entertained.

Note that the TYPEALIAS is not as flexible as C's typedef facility because
certain type modifiers of C (such as array bounds) are attributes in
Fortran,
rather than being a part of the type.

Attributes of procedures and dummy arguments
--------------------------------------------

A new VALUE attribute (along with a VALUE declaration statement) is defined
for
dummy data objects, and a BIND attribute is introduced for the function and
subroutine statements.  A Fortran procedure interoperates with a C function
if:

  o the procedure is declared with the BIND(C) attribute;
  o the results of the procedure and function interoperate, if the Fortran
    procedure is a function, or the result type of the C function is void,
if
    the Fortran procedure is a subroutine;
  o the number of  dummy arguments of the Fortran procedure is equal to the
    number of formal parameters of the C function;
  o dummy arguments with the VALUE attribute interoperate with corresponding
    formal parameters; and
  o dummy arguments without the VALUE attribute correspond to formal
parameters
    that are pointers whose reference types (need correct C terminology
here!)
    interoperate with the types of the corresponding dummy argument.

The BIND(C) attribute shall not be specified for a subroutine or function if
it
requires an explicit interface or has asterisk dummy arguments.

Note that the requirement that the Fortran procedure not require an explicit
interface prohibits dummy arguments from having the POINTER attribute,
having
the ALLOCATABLE attribute, being assumed-shape arrays, having an array
result,
etc.

Here's an example:

    BIND(C) INTEGER(C_SHORT) FUNCTION FUNC(I, J, K, L, M)
      INTEGER(C_INT), VALUE :: I
      REAL(C_DOUBLE) :: J
      INTEGER(C_INT) :: K, L(10)
      TYPE(C_PTR), VALUE :: M
    END FUNCTION FUNC

    short func(int i; double *j; int *k; int l[10], void *m);

This ties together some of the syntax specified in previous sections.  Note
that a C pointer may correspond to a Fortran dummy argument of type C_PTR,
or
to a Fortran scalar that does not have the VALUE attribute.  Fortran's rules
of
type checking will not be hobbled in order to provide access to C pointers
to
void, and the like; instead, a C_LOC intrinsic will be introduced to get the
address of a Fortran object.

If an object is not a dummy argument, it shall not have the VALUE attribute.
An object shall not have the VALUE attribute if it is an array or has
INTENT(OUT) or INTENT(INOUT).  A dummy argument of type character with a
length
parameter whose value is not one, shall not have the VALUE attribute.

If a dummy argument in a subprogram has the VALUE attribute, it implicitly
has
the INTENT(IN) attribute.  (That is, VALUE can be used in a Fortran
subprogram
definition.)

Handling of characters
----------------------

All references in the following are to 98-007r2.

In C, objects that are of a character type contain a single character value.
In Fortran, entities of type character can be composed of more than one
character value.  In C, multi-character objects are created using arrays,
and
in the case of strings, a specially distinguished NULL character marks the
end
of the string.

In order to facilitate passing Fortran character entities with character
length
parameters greater than one to C strings, the Fortran rules on sequence
association need to be relaxed.  Today, the following is a valid Fortran
program:

      program p
        character(10) :: a(1)
        call sub(a)
      contains
        subroutine sub(b)
          character(1) :: b(10)
        end subroutine sub
      end program p

but the following is not:

      program p
        character(10) :: a
        call sub(a)
      contains
        subroutine sub(b)
          character(1) :: b(10)
        end subroutine sub
      end program p

That is, sequence association involving characters permits arrays of
characters
to blur the division between the number of array elements and the length of
the
element, but this blurring only applies when an actual argument is a
character
array, an element of a character array or a substring of such an array
element,
but does not allow a scalar of type character to be argument associated with
a
dummy argument associated with a character array, unless it is an element of
a
character array.

In order to make this change, the rules in 12.4.1.1 [227:13-15] and 12.4.1.4
need to be modified to allow this.

Note:  This change could conceivably cause problems for some Fortran 95
       processor, though whether there is actually a processor that relies
on
       this is unclear.

An example of this approach as it applies to interoperability:

      program p
        interface
          bind(c) subroutine myprintf(dummy)
            ! The Fortran array corresponds with the C
            ! array dummy
            character(1) :: dummy(0:12)
          end subroutine myprintf
        end interface
        character(13) :: actual = "Hello, world!"

        ! actual is a scalar that is not an array
        ! element, which is not currently allowed
        call myprintf(actual)
      end program p

      void myprintf(char dummy[13])
      { ... }

Note:  This approach was preferred in a straw vote to the alternative of
       allowing Fortran scalar characters to interoperate with C character
       arrays by a vote of 6-4-1.  An additional straw vote on whether this
       extension to sequence association should be permitted only in
references
       to BIND(C) procedures passed by a vote of 5-4-2.  A third straw vote
on
       whether to allow this extension to other types failed by a vote of
       2-8-2.

In order to simplify things, subgroup decided to ignore the result of the
second straw vote.

Sample edits:
  In 5.1.2.4.4 [63:38+]
        Add    "If the actual argument is of type default
                character and is a scalar that is not an
                array element or array element substring
                designator, the size of the dummy array is
                MAX(INT(l/e), 0), where e is the length of
                an element in the dummy character array, and
                l is the length of the actual argument."

  In 12.4.1.1 [226:9-10]
        Change "or array element"
        to     "array element, or scalar"

  In 12.4.1.1 [226:11]
        After  "array"
        add    "or scalar"

  In 12.4.1.1 [227:14-15]
        Change "or a substring of such an element"
        to     "or a scalar of type default character".

  In 12.4.1.4 [228:45]
        Change "an array element substring designator"
        to     "a scalar of type default character"

  In 12.4.1.4 [229:8+]
        Add    "If the actual argument is of type default
                character and is a scalar that is not an
                array element or array element substring
                designator, the element sequence consists
                of the character storage units of the actual
                argument."

Null characters
---------------

The C character and wchar_t data types have designated null characters,
whose
value is zero.  The ISO_C_TYPES module shall make accessible a named
constant,
C_NULLCHAR, of type character, whose kind type parameter is equal to the
value
of C_CHAR.  The value of C_NULLCHAR shall be equal to the value of the null
character of the C char data type.

Note:  A straw vote on whether to use CHAR(0) vs. a C_NULLCHAR named
constant
       showed a preference for C_NULLCHAR (2-6-3).

Length type parameters
----------------------

The character length parameter of a Fortran entity that interoperates with a
C
entity shall not be an asterisk.

Global Data
-----------

In Fortran, there are two facilities for specifying global data:  common
blocks
and modules.  The existing practice for sharing global data between Fortran
and
C is to use common blocks with the same name as the C object with external
linkage.  The interoperability facility will build off of such existing
practice.

The BIND(C) attribute can also be specified for a variable declared in the
scope of a module, with the restriction that only one variable that is
associated with a particular C variable with external linkage is permitted
to
be declared within a program.

For uniformity with other attributes, a BIND(C) attribute statement will
also
be introduced.

For example,

      integer, bind(c) :: i
      integer :: j, k
      common /com/ k
      bind(c) :: j, /com/

The BIND(C) attribute shall only be specified for a variable if it is
declared
in the scope of a module.  The variable shall interoperate with the C type
of
the associated C variable that has external linkage.  The variable shall not
be
explicitly initialized, it shall not have the POINTER attribute, the
ALLOCATABLE attribute, appear in an EQUIVALENCE statement or be a member of
a
common block.

If a common block is given the BIND(C) attribute, it shall be given the
BIND(C)
attribute in all scoping units in which it is declared.  A C variable with
external linkage interoperates with a common block that has the BIND(C)
attribute, if the C variable is of a struct type and the variables that are
members of the common block interoperate with corresponding components of
the
struct type, or if the common block contains a single variable, and the
variable interoperates with the C variable.

A variable in a common block with the BIND(C) attribute shall not be
explicitly
initialized and it shall not be the parent object of an <equivalence-object>
in
an EQUIVALENCE statement.

If a variable or common block has the BIND(C) attribute, it has the SAVE
attribute as well.

A variable with the BIND(C) attribute is a global entity of a program
(14.1.1).
Such an entity shall not be declared in more than one scoping unit of the
program.

Note that a straw poll favoured allowing BIND(C) to be specified on both
variables and common blocks: 7-4-1.

Name Mangling
-------------

Paper 98-139 contained an outline for syntax for mapping Fortran variable
and
procedure names to C variable and function names.  That proposal is restated
here.

The BIND <prefix-spec> and attribute have an additional, optional specifier,
NAME=, that is followed by a scalar initialization expression of type
default
character.  If neither NAME= nor BINDNAME= is specified, NAME= is assumed to
have a value that is the same as the name specified for the variable or
procedure in lower case.

Note:  This is an arbitrary choice, but it seems like the reasonable one,
since
       few users of C write their functions with names that are entirely in
       upper case.

An additional BINDNAME= attribute may also appear, followed by a scalar
default
initialization expression of type default character.  At most one NAME=
specifier is permitted to appear in a BIND <prefix-spec> or attribute.  More
than one BINDNAME= specifier may appear in a BIND <prefix-spec> for a
subprogram, but not an interface body or the BIND attribute for a variable.

Any leading and trailing blanks in the value of a NAME= specifier are
ignored.
The value of the NAME= specifier on an interface body or variable must
correspond to some C function or variable, respectively, with the same name.

Note:  The intent here is that NAME= allows the user to specify C names that
       are not valid Fortran names, and provides a mechanism through which
the
       processor can distinguish between upper and lower case.

Section 14.1.1 states that "A name that identifies a global entity shall not
be
used to identify any other global entity in the same program."  This rule
needs
to be extended to make it clear that the value of the NAME= specifier might
identify a global entity, and it shall not be used to identify any other
global
entity - the value isn't necessarily a name in the Fortran sense, so some
modification of the existing rule is required.

This has the effect that two external procedures might have the same name,
but
still be distinct entities, because the values specified by NAME= specifiers
might be different.  For example,

      program p
        interface
          bind(c,name='CSub') subroutine c_sub
          end subroutine c_sub
        end interface

        call f_sub
      end program p

      subroutine f_sub
        interface
          bind(c,name='CSub2') subroutine c_sub
          end subroutine c_sub
        end interface
      end subroutine f_sub

The meaning ascribed to the BINDNAME= specifier is processor-dependent.

Note:  The value of the BINDNAME= specifier is intended to specify one or
more
       alternative names by which a procedure defined by Fortran may be
       referenced from C, when a user wants to build a library that supports
       multiple C processors at once.  The name is a (potentially) mangled
       name, rather than the name that is actually specified in the C code.