To: J3 J3/18-240 From: William Clodius Subject: On the semantic options for templates/generics Date: 2018-August-09 On the semantic options for templates/generics William B. Clodius The Fortran standard committees, J3 and WG5, are considering the inclusion of some form of generics inspired by the popularity of C++ templates and the robustness of Ada generics. The most recent proposals, 18-112 and 18-116, are very Ada and C++ specific in their details, don't compare the two approaches, or explore the range of choices beyond those two models. I therefore thought it useful to summarize the semantic options for generics. Such a summary should facilitate the development of a generic whose semantics best match the needs of the Fortran community. In an appendix I illustrate some of the semantics with pseudo code fragments. I hope that readers of this will let me know of any mistakes or omissions so that later versions of this document will be complete and correct. I also hope they will suggest how the examples in the appendix can be improved. A generic is a parameterized subprogram, whose result is a subprogram whose semantics depend on the actual arguments to the generic. The semantics of generics depend on: 1. The classes of subprograms that are allowed to be a generic's result; 2. The classes of parameters allowed as generic arguments; 3. How the generic's parameters are identified; 4. How the parameters affect the logic in the mapping of the generic to the final sub-program; 5. The context in which the generic is defined; and 6. How the contents of generic containers are accessed. 7. How different instantiations of the same generic interact. Fortran has the following subprograms that can, in principle, be generated by a generic: the main program, modules, submodules, subroutines, and functions. With the current semantics of the main program there is little use for implementing it as a generic, but other languages make the main program effectively an execute once subprogram in the partial order determined by the use chain. This allows the internal state of their equivalent of a module to be automatically initialized. If Fortran became such a language, a main program in a parameterized module could be useful. As modules are Fortran's preferred method of encapsulation they are the natural subject for generics. Submodules are currently limited in their semantics, but, as discussed below, a generalization might make them useful instantiations of generic modules. Subroutines and functions are common subjects of generics. Even though they can be implicitly parameterized as parts of modules, it can be a convenience to have specialized generics for procedures. I would strongly encourage Fortran to adopt generics for both modules and procedures, but not main, but would find it acceptable if only generic modules were provided. Almost anything could be allowed to be an argument to a generic: types, values, variables, procedures, and even modules. Types are particularly valuable for defining containers. Integer values are useful in Fortran for specifying the kinds of the intrinsic types, and are useful in any language for specifying the sizes of structures. Floating point values can be used for such things as the fill factors of open hash tables. Variables are useful for implicitly parameterizing procedures. Variables and values can also be used to implement closures, a special case of generics. Procedure arguments can be used for mapping or reduction operations. Modules, in other languages, have semantics closer to those of Fortran's derived types rather than the namespace control of Fortran. This, I believe, makes them more useful in the other languages than in Fortran. The name space control syntax of Fortran's modules I find awkward to map to a generic argument syntax. I would encourage allowing arguments to generic modules to be any of constants, types, and procedures, but have no strong feelings about module arguments that are variables or other modules. If generic procedures are to be implemented then I believe generic arguments that are variables would be useful. (Note that as a consequence of creating generic subprograms I would expect that parameterized derived types be generalized to allow types as parameters.) Generics almost invariably involve an explicit argument list. Generics can either explicitly constrain the arguments in the list, or follow the rules of macros, in which validity checking is only performed after substitutions are made. The constraints are a burden on the implementer, and can overly constrain the use of the generic, but let the user know the proper use of the generic and allow the processor to generate clearer error messages. My preference is that generic arguments be consistently explicitly constrained. It can be useful (and confusing) to allow explicit logic and even iterations in what Fortran calls the specification part of the subprogram. It is useful in that it can allow explicit optimizations by the developer that are otherwise hard to recognize by the processor, and through checks such as assertions, bot validate that the actual arguments satisfy the pre-conditions of the subprogram and document those preconditions. As an example, consider a bit string data type. Assume a generic whose only argument is the number of bits to be used in the string. The most efficient implementation is to encode the string in the smallest kind of integer large enough to hold the number of bits. Selecting the appropriate integer requires a logical decision tree. It may be possible to implement that tree without an explicit tree, i.e. by indexing the kind value by the number of bits, but the implementation will be obscure to the maintainer. My preference would be to allow logical tests, but not iterations, to appear explicitly in the specification part of subprograms. I have a very strong preference for assertions. Fortran tries to be a define before use language. This complicates the use of generic modules. In other languages, the generic would normally be instantiated in the equivalent of the use statement. This has at least two disadvantages in Fortran. First, it makes it impractical to maintain define before use, require use of a module before the specification part of the subprogram, and instantiate a generic module using entities defined in the specification part of the module. Second, it encourages the instantiation of essentially the same module in multiple contexts resulting in code bloat and compilation cascades. The first problem can be addressed by extending the concept of submodules, and allowing a generic module to be instantiated as a "submodule" in the specification part of the subprogram using it. The second problem can be addressed by allowing a generic module to be instantiated as an "independent" module. The type of syntax implied by these options is illustrated in the appendix. My preference would be to allow "submodule statements' in the specification part of a subprogram, and independent module instantiation. One of the most important uses of generics is in the definition of containers: e.g., lists, sets, trees, and heaps. It is useful to define a uniform means of accessing the elements of a container termed an iterator. The iterator consists of two parts the iterator proper as implemented by the developer, and a construct that implicitly invokes the iterator. The iterator proper usually provides a set sequential access, but it might be useful to allow a parallel access form, though the history of the FORALL construct is discouraging. In other languages the sequential invoker usually has a form based on their FOR loops that is syntactically intuitive. but Fortran might use a DO based construct. Closely related to iterators are maps, filters, and comprehensions. Iterators and maps would often be simplified if Fortran included co-routines. My preference would be to have a specific way of defining iterators, an a relatively simple way of defining maps and filters. It can happen that the same generic is instantiated multiple times with the same parameters. This raises two questions. If the generic has internal state, do all "duplicate" instantiations share the same state or are their states unique? If the generic is a module that defines one or more derived types, are the types with the same name unique between different instantiations or identical? My current opinion is that by default instantiations with the same parameters should be the same in state and derived type definition, but the user (or developer) should have the option to specify uniqueness. Appendix: Example semantics I find it useful to illustrate some of the semantic options with examples. The examples are in a Fortran inspired pseudo-code and shouldn't be taken as a proposed concrete syntax. The options illustrated include: parametrized modules and procedures with argument constraints, the use of logical tests in the specification part of a module, the instantiation of submodules in the specification part, the instantiation of an independent module, An example of a generic procedure that I would expect would normally parameterized by variables GENERIC SUBROUTINE SWAP( X, Y ) TYPE(*). INTENT(INOUT) :: X TYPE(TYPE_OF(X)), INTENT(INOUT) :: Y TYPE(TYPE_OF(X)) :: Z INTERFACE ASSIGNMENT (=) SUBROUTINE ASSIGN(U,V) TYPE(TYPE_OF(X)), INTENT(OUT) :: U TYPE(TYPE_OF(X)), INTENT(IN) :: V END INTERFACE ASSIGNMENT Z = X X = Y Y = Z END SUBROUTINE SWAP An example of the use of SWAP REAL :: B ... SUBROUTINE :: SWAP_WITH_B( A ) REAL, INTENT(INOUT) :: A CALL SWAP( A, B ) RETURN END SUBROUTINE SWAP_WITH_B An example of a module parameterized by type, illustrating the potential, and awkwardness, of including mapping and iterator procedures. MODULE SIMPLE_LINKED_LISTS( A ) ATYPE :: A INTERFACE ASSIGNMENT (=) SUBROUTINE ASSIGN(U,V) TYPE(A), INTENT(OUT) :: U TYPE(A), INTENT(IN) :: V END INTERFACE ASSIGNMENT .... TYPE LIST GENERIC ! Usually we want generic derived types declared with ! the same parameters to be treated as equivalent ! declarations TYPE(LIST), POINTER :: LINK => NULL() TYPE(A) :: ELEMENT END TYPE LIST ... CONTAINS ... SUBROUTINE APPEND( ALIST, ELEMENT ) TYPE(LIST), INTENT(INOUT), POINTER :: ALIST TYPE(A) :: ELEMENT TYPE(LIST), POINTER :: NEXT ALLOCATE( ALIST % LINK ) ALIST % LINK % ELEMENT = ELEMENT ALIST => ALIST % LINK RETURN END SUBROUTINE APPEND ... RECURSIVE SUBROUTINE MAP_FUNCTION( OUT_LIST, IN_LIST, FUNC ) EXTERNAL FUNC ! I would prefer a different way of declaring ! FUNC an interface with ignored keywords TYPE(*) :: FUNC TYPE B_LIST GENERIC TYPE(LIST), POINTER :: LINK => NULL() TYPE(TYPE_OF(FUNC)) :: ELEMENT END TYPE B_LIST TYPE(B_LIST), INTENT(OUT), POINTER :: OUT_LIST TYPE(LIST), INTENT(IN), POINTER :: IN_LIST TYPE(B_LIST), POINTER :: DUMMY IF ( ASSOCIATED(IN_LIST) ) THEN ALLOCATE(OUT_LIST) OUT_LIST % ELEMENT = FUNC( IN_LIST % ELEMENT ) CALL MAP_FUNCTION( DUMMY, IN_LIST % LINK, FUNC ) OUT_LIST % LINK => DUMMY RETURN ELSE OUT_LIST => NULL() RETURN END IF END SUBROUTINE MAP_FUNCTION ... ITERATOR SUBROUTINE ITER( ALIST, ELEMENT, TEST ) ! TYPE(LIST), INTENT(IN), TARGET :: ALIST TYPE(A), INTENT(OUT) :: ELEMENT LOGICAL, INTENT(OUT) :: TEST TYPE(LIST), POINTER, SAVE :: BASE => NULL() TYPE(LIST), POINTER, SAVE :: NEXT => NULL() TYPE(LIST), POIINTER, SAVE :: CURRENT => NULL() IF ( ASSOCIATED( ALIST, BASE) ) THEN CURRENT => NEXT ELSE CURRENT => ALIST END IF IF ( ASSOCIATED( CURRENT ) THEN ELEMENT = CURRENT % ELEMENT NEXT => CURRENT % LINK TEST = .TRUE, ELSE TEST = .FALSE. BASE => NULL() END IF RETURN END SUBROUTINE ITER ... END MODULE SIMPLE_LINKED_LISTS An example of a module parameterized by an integer value benefiting by logical chains in the specification part MODULE BIT_STRINGS( BITS ) USE, INTRINSIC :: ISO_FORTRAN_ENV INTEGER, LEN :: BITS ASSERT( BITS <= 64 .AND. BITS >= 1 ) IF ( BITS <= 8 ) THEN INTEGER, PARAMETER : BIT_KIND = INT8 ELSE IF ( BITS <= 16 ) THEN INTEGER, PARAMETER : BIT_KIND = INT16 ELSE IF ( BITS <= 32 ) THEN INTEGER, PARAMETER : BIT_KIND = INT32 ELSE IF ( BITS <= 64 ) THEN INTEGER, PARAMETER :: BIT_KIND = INT64 END IF .... TYPE BIT_STRING INTEGER( BIT_KIND ) :: STRING END TYPE BIT_STRING ... END MODULE BIT_STRING An example of a module parameterized by an integer value using a work around for logical chains in the specification part MODULE BIT_STRINGS( BITS ) USE, INTRINSIC :: ISO_FORTRAN_ENV INTEGER, LEN :: BITS ASSERT( BITS <= 64 .AND. BITS >= 1 ) INTEGER, PARAMETER :: & KINDS(8) = [INT8, INT16, INT32, INT32, INT64, INT64, INT64, INT64] INTEGER, PARAMETER : BIT_KIND = KINDS( (BITS-1)/8+1) .... TYPE BIT_STRING INTEGER( BIT_KIND ) :: STRING END TYPE BIT_STRING ... END MODULE BIT_STRING Examples of possible means of initiating a generic module. 1. If the arguments are defined win other modules order the use statements so that define before use is preserved ... USE BIT_STRINGS(30) USE SIMPLE_LINKED_LIST( BIT_STRING ) ... 2. If the arguments are defined in the specification part allow use before definition ... USE SIMPLE_LINKED_LIST( A ) ... TYPE A ... END TYPE A ... 3. If the arguments are defined in the specification part allow use in the specification part ... USE BIT_STRING( 30 ) ... TYPE A ... END TYPE A ... USE SIMPLE_LINKED_LIST( A ) ... 4. If the arguments are defined in the specification part allow instantiation of a submodule in the specification part ... USE BIT_STRING( 30 ) ... TYPE A ... END TYPE A ... SUBMODULE SIMPLE_LINKED_LIST( A ) ... 5. Allowing instantiation as a named module MODULE BIT_LISTS = SIMPLE_LINKED_LISTS( BIT_STRING ): LIST => BIT_LIST USE BIT_STRINGS END MODULE BIT_LISTS