To: J3                                                     J3/18-240
From: William Clodius
Subject: On the semantic options for templates/generics
Date: 2018-August-09

On the semantic options for templates/generics
William B. Clodius

The Fortran standard committees, J3 and WG5, are considering the
inclusion of some form of generics inspired by the popularity of C++
templates and the robustness of Ada generics. The most recent proposals,
18-112 and 18-116, are very Ada and C++ specific in their details, don't
compare the two approaches, or explore the range of choices beyond those
two models. I therefore thought it useful to summarize the semantic
options for generics. Such a summary should facilitate the development
of a generic whose semantics best match the needs of the Fortran
community. In an appendix I illustrate some of the semantics with pseudo
code fragments. I hope that readers of this will let me know of any
mistakes or omissions so that later versions of this document will be
complete and correct. I also hope they will suggest how the examples in
the appendix can be improved.

A generic is a parameterized subprogram, whose result is a subprogram
whose semantics depend on the actual arguments to the generic. The
semantics of generics depend on:
    1. The classes of subprograms that are allowed to be a generic's
        result;
    2. The classes of parameters allowed as generic arguments;
    3. How the generic's parameters are identified;
    4. How the parameters affect the logic in the mapping of the generic
        to the final sub-program;
    5. The context in which the generic is defined; and
    6. How the contents of generic containers are accessed.
    7. How different instantiations of the same generic interact.

Fortran has the following subprograms that can, in principle, be
generated by a generic: the main program, modules, submodules,
subroutines, and functions. With the current semantics of the main
program there is little use for implementing it as a generic, but other
languages make the main program effectively an execute once subprogram
in the partial order determined by the use chain. This allows the
internal state of their equivalent of a module to be automatically
initialized. If Fortran became such a language, a main program in a
parameterized module could be useful. As modules are Fortran's preferred
method of encapsulation they are the natural subject for
generics. Submodules are currently limited in their semantics, but, as
discussed below, a generalization might make them useful instantiations
of generic modules. Subroutines and functions are common subjects of
generics. Even though they can be implicitly parameterized as parts of
modules, it can be a convenience to have specialized generics for
procedures. I would strongly encourage Fortran to adopt generics for
both modules and procedures, but not main, but would find it acceptable
if only generic modules were provided.

Almost anything could be allowed to be an argument to a generic: types,
values, variables, procedures, and even modules. Types are particularly
valuable for defining containers. Integer values are useful in Fortran
for specifying the kinds of the intrinsic types, and are useful in any
language for specifying the sizes of structures. Floating point values
can be used for such things as the fill factors of open hash
tables. Variables are useful for implicitly parameterizing
procedures. Variables and values can also be used to implement closures,
a special case of generics. Procedure arguments can be used for mapping
or reduction operations. Modules, in other languages, have semantics
closer to those of Fortran's derived types rather than the namespace
control of Fortran. This, I believe, makes them more useful in the other
languages than in Fortran. The name space control syntax of Fortran's
modules I find awkward to map to a generic argument syntax. I would
encourage allowing arguments to generic modules to be any of constants,
types, and procedures, but have no strong feelings about module
arguments that are variables or other modules. If generic procedures are
to be implemented then I believe generic arguments that are variables
would be useful. (Note that as a consequence of creating generic
subprograms I would expect that parameterized derived types be
generalized to allow types as parameters.)

Generics almost invariably involve an explicit argument list. Generics
can either explicitly constrain the arguments in the list, or follow the
rules of macros, in which validity checking is only performed after
substitutions are made. The constraints are a burden on the implementer,
and can overly constrain the use of the generic, but let the user know
the proper use of the generic and allow the processor to generate clearer
error messages. My preference is that generic arguments be consistently
explicitly constrained.

It can be useful (and confusing) to allow explicit logic and even
iterations in what Fortran calls the specification part of the
subprogram. It is useful in that it can allow explicit optimizations by
the developer that are otherwise hard to recognize by the processor, and
through checks such as assertions, bot validate that the actual
arguments satisfy the pre-conditions of the subprogram and document
those preconditions. As an example, consider a bit string data
type. Assume a generic whose only argument is the number of bits to be
used in the string. The most efficient implementation is to encode the
string in the smallest kind of integer large enough to hold the number
of bits. Selecting the appropriate integer requires a logical decision
tree. It may be possible to implement that tree without an explicit
tree, i.e. by indexing the kind value by the number of bits, but the
implementation will be obscure to the maintainer. My preference would be
to allow logical tests, but not iterations, to appear explicitly in the
specification part of subprograms. I have a very strong preference for
assertions.

Fortran tries to be a define before use language. This complicates the
use of generic modules. In other languages, the generic would normally
be instantiated in the equivalent of the use statement. This has at
least two disadvantages in Fortran. First, it makes it impractical to
maintain define before use, require use of a module before the
specification part of the subprogram, and instantiate a generic module
using entities defined in the specification part of the module. Second,
it encourages the instantiation of essentially the same module in
multiple contexts resulting in code bloat and compilation cascades. The
first problem can be addressed by extending the concept of submodules,
and allowing a generic module to be instantiated as a "submodule" in the
specification part of the subprogram using it. The second problem can be
addressed by allowing a generic module to be instantiated as an
"independent" module. The type of syntax implied by these options is
illustrated in the appendix. My preference would be to allow "submodule
statements' in the specification part of a subprogram, and independent
module instantiation.

One of the most important uses of generics is in the definition of
containers: e.g., lists, sets, trees, and heaps. It is useful to define
a uniform means of accessing the elements of a container termed an
iterator. The iterator consists of two parts the iterator proper as
implemented by the developer, and a construct that implicitly invokes
the iterator. The iterator proper usually provides a set sequential
access, but it might be useful to allow a parallel access form, though
the history of the FORALL construct is discouraging. In other languages
the sequential invoker usually has a form based on their FOR loops that
is syntactically intuitive. but Fortran might use a DO based construct.
Closely related to iterators are maps, filters, and
comprehensions. Iterators and maps would often be simplified if Fortran
included co-routines. My preference would be to have a specific way of
defining iterators, an a relatively simple way of defining maps and
filters.

It can happen that the same generic is instantiated multiple times with
the same parameters. This raises two questions. If the generic has
internal state, do all "duplicate" instantiations share the same state or
are their states unique? If the generic is a module that defines one or
more derived types, are the types with the same name unique between
different instantiations or identical? My current opinion is that by
default instantiations with the same parameters should be the same in
state and derived type definition, but the user (or developer) should
have the option to specify uniqueness.

Appendix: Example semantics

I find it useful to illustrate some of the semantic options with
examples. The examples are in a Fortran inspired pseudo-code and
shouldn't be taken as a proposed concrete syntax. The options
illustrated include: parametrized modules and procedures with argument
constraints, the use of logical tests in the specification part of a
module, the instantiation of submodules in the specification part,
the instantiation of an independent module,


An example of a generic procedure that I would expect would normally
parameterized by variables

    GENERIC SUBROUTINE SWAP( X, Y )
        TYPE(*). INTENT(INOUT)                 :: X
        TYPE(TYPE_OF(X)), INTENT(INOUT) :: Y
        TYPE(TYPE_OF(X))                           :: Z
        INTERFACE ASSIGNMENT (=)
            SUBROUTINE ASSIGN(U,V)
            TYPE(TYPE_OF(X)), INTENT(OUT) :: U
            TYPE(TYPE_OF(X)), INTENT(IN)    :: V
        END INTERFACE ASSIGNMENT
        Z = X
        X = Y
        Y = Z
    END SUBROUTINE SWAP


An example of the use of SWAP

    REAL :: B
...
    SUBROUTINE :: SWAP_WITH_B( A )
        REAL, INTENT(INOUT) :: A
        CALL SWAP( A, B )
        RETURN
    END SUBROUTINE SWAP_WITH_B


An example of a module parameterized by type, illustrating the
potential, and awkwardness, of including mapping and iterator
procedures.

    MODULE SIMPLE_LINKED_LISTS( A )
        ATYPE :: A
        INTERFACE ASSIGNMENT (=)
            SUBROUTINE ASSIGN(U,V)
            TYPE(A), INTENT(OUT) :: U
            TYPE(A), INTENT(IN)    :: V
        END INTERFACE ASSIGNMENT
       ....
        TYPE LIST
            GENERIC ! Usually we want generic derived types declared with
                    ! the same parameters to be treated as equivalent
                    ! declarations

           TYPE(LIST), POINTER :: LINK => NULL()
	   TYPE(A)                     :: ELEMENT
       END TYPE LIST
...
    CONTAINS
...
        SUBROUTINE APPEND( ALIST, ELEMENT )
            TYPE(LIST), INTENT(INOUT), POINTER :: ALIST
            TYPE(A)                                     :: ELEMENT
            TYPE(LIST), POINTER :: NEXT

            ALLOCATE( ALIST %  LINK )
            ALIST % LINK % ELEMENT = ELEMENT
            ALIST => ALIST % LINK
            RETURN
        END SUBROUTINE APPEND
...

        RECURSIVE SUBROUTINE MAP_FUNCTION( OUT_LIST, IN_LIST, FUNC )
            EXTERNAL FUNC ! I would prefer a different way of declaring
                          ! FUNC an interface with ignored keywords
            TYPE(*) :: FUNC
            TYPE B_LIST
               GENERIC
               TYPE(LIST), POINTER :: LINK => NULL()
	       TYPE(TYPE_OF(FUNC))                     :: ELEMENT
            END TYPE B_LIST
            TYPE(B_LIST), INTENT(OUT), POINTER :: OUT_LIST
            TYPE(LIST), INTENT(IN), POINTER :: IN_LIST

            TYPE(B_LIST), POINTER :: DUMMY

             IF ( ASSOCIATED(IN_LIST) ) THEN
                ALLOCATE(OUT_LIST)
                OUT_LIST % ELEMENT = FUNC( IN_LIST % ELEMENT )
                CALL MAP_FUNCTION( DUMMY, IN_LIST % LINK, FUNC )
                OUT_LIST % LINK => DUMMY
                RETURN

            ELSE
                OUT_LIST => NULL()
                RETURN

            END IF

        END SUBROUTINE MAP_FUNCTION
...
         ITERATOR SUBROUTINE ITER( ALIST, ELEMENT, TEST )
!
            TYPE(LIST), INTENT(IN), TARGET :: ALIST
            TYPE(A), INTENT(OUT) :: ELEMENT
            LOGICAL, INTENT(OUT) :: TEST
            TYPE(LIST), POINTER, SAVE :: BASE => NULL()
            TYPE(LIST), POINTER, SAVE :: NEXT => NULL()
            TYPE(LIST), POIINTER, SAVE :: CURRENT => NULL()

            IF ( ASSOCIATED( ALIST, BASE) ) THEN
                CURRENT => NEXT
            ELSE
                CURRENT => ALIST
            END IF
            IF ( ASSOCIATED( CURRENT ) THEN
                ELEMENT = CURRENT % ELEMENT
                NEXT => CURRENT % LINK
                TEST = .TRUE,
            ELSE
                TEST = .FALSE.
                BASE => NULL()
            END IF
            RETURN
        END SUBROUTINE ITER
...
    END MODULE SIMPLE_LINKED_LISTS


An example of a module parameterized by an integer value benefiting by
logical chains in the specification part

    MODULE BIT_STRINGS( BITS )

        USE, INTRINSIC :: ISO_FORTRAN_ENV

        INTEGER, LEN :: BITS

        ASSERT( BITS <= 64 .AND. BITS >= 1 )

        IF ( BITS <= 8 ) THEN
            INTEGER, PARAMETER : BIT_KIND = INT8
        ELSE IF ( BITS <= 16 ) THEN
            INTEGER, PARAMETER : BIT_KIND = INT16
        ELSE IF ( BITS <= 32 ) THEN
            INTEGER, PARAMETER : BIT_KIND = INT32
        ELSE IF ( BITS <= 64 ) THEN
            INTEGER, PARAMETER :: BIT_KIND = INT64
        END IF

       ....
       TYPE BIT_STRING
	   INTEGER( BIT_KIND ) :: STRING
       END TYPE BIT_STRING
        ...

    END MODULE BIT_STRING


An example of a module parameterized by an integer value using a work
around for logical chains in the specification part

    MODULE BIT_STRINGS( BITS )

        USE, INTRINSIC :: ISO_FORTRAN_ENV

        INTEGER, LEN :: BITS

        ASSERT( BITS <= 64 .AND. BITS >= 1 )

        INTEGER, PARAMETER :: &
        KINDS(8) = [INT8, INT16, INT32, INT32, INT64, INT64, INT64, INT64]
        INTEGER, PARAMETER : BIT_KIND = KINDS( (BITS-1)/8+1)
       ....
       TYPE BIT_STRING
	   INTEGER( BIT_KIND ) :: STRING
       END TYPE BIT_STRING
        ...

    END MODULE BIT_STRING


Examples of possible means of initiating a generic module.

1. If the arguments are defined win other modules order the use
statements so that define before use is preserved
...
USE BIT_STRINGS(30)
USE SIMPLE_LINKED_LIST( BIT_STRING )
...

2. If the arguments are defined in the specification part allow use
before definition
...
    USE SIMPLE_LINKED_LIST( A )
...
    TYPE A
...
    END TYPE A
...

3. If the arguments are defined in the specification part allow use in
the specification part
...
    USE BIT_STRING( 30 )
...
    TYPE A
...
    END TYPE A
...
    USE SIMPLE_LINKED_LIST( A )
...


4. If the arguments are defined in the specification part allow
instantiation of a submodule in the specification part
...
    USE BIT_STRING( 30 )
...
    TYPE A
...
    END TYPE A
...
    SUBMODULE SIMPLE_LINKED_LIST( A )
...

5.  Allowing instantiation as a named module

    MODULE BIT_LISTS = SIMPLE_LINKED_LISTS( BIT_STRING ): LIST => BIT_LIST
        USE BIT_STRINGS
    END MODULE BIT_LISTS