To: J3                                                     J3/18-110
From: Tom Clune
Subject: Use case for containers
Date: 2018-February-04


The contents of this paper are in paper J3/##-###.xxx


Introduction:
-------------

1. As scientific models become more complex, an increasing fraction of
   the lines of code are infrastructure as-opposed to direct
   numerical calculation.  Examples of infrastructure include software
   layers that couple independently developed subsystems, frameworks
   for managing distributed parallelism, advanced I/O (e.g.,
   checkpoint/restart via NetCDF), etc.  Generally these
   infrastructure layers evolve as an attempt to avoid code
   duplication as multiple parts of the system require similar
   functionality.

2. Software "containers" are abstractions that enable aggregating
   groups of related entities for convenient and efficient access.
   Many categories of software containers have been defined, with each
   category generally tailored to a common design issue.  By providing a
   "standard" and reliable means to perform commonly occurring
   operations, containers can greatly simplify design and
   implementation of complex algorithms.

3. Fortran provides just one category of software container - Array.
   Array containers are designed to optimize random access to a
   _fixed_ collection of elements, and generally requires all
   contained elements to be of the same dynamic type.  Fortran
   provides excellent mechanism for declaring and constructing arrays
   of any type as well as for accessing, storing, and modifying array
   members (elements) with succinct, tailored syntax: tuples of
   indices within parens:

   a(i,j) = x
   x = a(i,j)
   a(i,j) = a(i,j)**2

4. Other commonly used container categories include Set, Vector (or
   List), Map (or Associative Array, Dictionary), Stack, Queue, etc.
   Unlike Array containers, these others generally are more dynamic -
   growing as necessary when new elements are added to the container.
   Here I briefly describe some of the most commonly categories of
   containers.

   a. Vectors are somewhat similar to 1-D Arrays in that the provide
      efficient random accesss.  But whereas the size of an Array is
      fixed at the time of its construction, a Vector can grow as
      elements are added.  (For those unfamiliar with containers,
      there is no magic here.  Internally arrays are reallocated with
      copies but in a manner that adding elements is of O(1)
      complexity on average.)

   b. Set containers provide efficient (O(log n)) mechanisms for
      searching and inserting distinct elements that can be ordered
      according to some criterion.  Typical implementations use a
      balanced binary tree data structure to store the elements.

   c. Maps are similar to Sets, except that the interfaces emphasize
      that the contained elements are key-value pairs.  Typically the
      keys are either integers or strings, but can generally be any
      data type that can be ordered.  A simple example would be a Map
      that whose keys are the names of students in a classroom, and
      the values are their grades.  A more relevant example would be a
      Map that provides efficient acces to a sparse array where the
      keys are the indices for the non-zero array elemnts.


5. Iterators.  Iterators provide a simple/starndard mechanism to
   efficiently loop through all of the elements in a container.  For
   containers such as Vector and Array, these are relatively simple,
   but for containers such as Set and Map that are implemented with
   binary trees, the iterator abstraction is extremely beneficial to
   the user of the container.  Iterators simultaneously hide
   complexity and encourage safe coding styles.  In pseudo-code iterator
   usage is typically something like:

   < declare container > C
   < declare container iterator > I
   < declare element > e

   I = begin(C)
   loop while I is not end(C)
       e = get(I)
       < do something with e >
       next(I)
   end loop


Containers and Fortran:
------------------------

Infrastructure layers for complex Fortran applications often
implicitly/unknowingly implement crude versions of containers like
vectors and associative arrays.  The Vector pattern arises naturally
when the total number of instances of some collection cannot be
determined via a simple formula or when elements are added
sporadically through different drivers.  Intead the size is
"discovered" through some iterative process.  A common implementation
in the simplest case is a two-pass algorithm.  The passes are nearly
identical except that the first pass just accumulates the number of
elements.  An allocation is then made and the 2nd pass fills in the
allocated structure.

The need for Map containers arises when we need to find entities
through som registry.  E.g., if an ocean component needs the surface
winds from an atmospheric component, an abstract coupler enables this
by allowing both components to use the names 'U-wind' and 'V-wind' to
retrieve the associated arrays.  This avoids hardcoding indices, which
is important when considering independently developed components.

Typical Fortran implementations of these patterns are ad-hoc.  Even
within the same application two different layers may implement the
same pattern in rather different ways.  The layers also often contain
latent bugs that are not exercised in the current component, but
prevent their adoption in other components.


Obstacles to clean/robust implementation of containers in Fortran:
------------------------------------------------------------------

One large obstacle to development and use of containers in Fortran is
the lack of support for generic coding.  Developers must either
duplicate the logic of a container for each potential type of element,
or must resort to non-Fortran means (preprocessors) to have a single
implementation that is applicable to all cases.  E.g., one may wish to
have Vectors of integers, reals, logicals, and/or various derived
types, leading to many almost identical implementations and all the
associated dangers of long-term maintenance.  With Maps, the problem
just grows combinatorially as one considers variant types of keys.

Another obstacle, albeit less severe, is the inability to overload
suitable operators for container accessors.  With Fortran arrays,
parens can be used to access individual elements on both the left and
right hand sides of assignment:

      x = a(i,j,k)
      b(i,j,k) = y


Ideally, Vector and Map containers would have similar mechanism for
accessing and modifying elements.  Note that containers generally
return pointers to contained elements rather than copies of those
elements.


State of the art?
-----------------

With aggressive and complicated use of CPP/FPP preprocessing it is
possible to write robust generic Vector and Map containers in Fortran.
I and my colleague Doron Feldman have produced such a package which
was recently released as open source: gFTL.  Another similar package,
FTL, has also been independently released.  (They beat us to the
cooler name.)  These implementations are a nice step forward, and have
been of enormous benefit in the development of new powerful software
layers.  However, these containers are still not as simple to use as
one could hope for number of reasons:

1. A separate Fortran class must be constructed for each contained
   type.  In languages with stronger support for containers, this step
   is performed by the compiler.

2. Even if two separate projects use the same container package (e.g.,
   gFTL), they cannot assume that the other is providing a common
   container and must redeclare for themselves.  One easily ends up
   with many all-but-identical modules for say IntegerVector.

3. Iterators are handled via Fortrans DO WHILE construct.  Users then
   put a call to iter%next() at the end of the loop.  This works well
   in simple cases, but is dangerous in the presence of CYCLE because
   the user must remember to invoke iter%next() there as well.  Other
   languages that have an "increment" component to their loop
   constructs (e.g., C, C++) can handle iterators more robustly by
   putting the iterator increment into the header of the loop.  This
   concern is probably more general than just iterators and should
   perhaps be a paper of its own.

4. As mentioned above, the supported accessors are clunky.  The
   inability to use syntax analogous to that of the tuples of indices
   for arrays is unfortunate.  It is a minor annoyance for references
   on the RHS of an asssignment, but much more so for references that
   would otherwise be on the LHS.  E.g., if we hava a containe for a
   sparse array and want to modify an element:

      CALL sparse%set([i,j], x)  ! supported but inelegant

   compared to

      sparse(i,j) = x  ! not implementable in Fortran 2018

5. Containers of pointers are problematic.  Fortran lacks a robusts
   mechanism to order pointers via some COMPARE method.  This prevents
   any standard-conforming implementation of a Set of pointers or a
   Map that has keys that are pointers.  Note that usual trick wrapping the
   pointers in a derived type does not help with this issue.


What could be done in the language?
------------------------------------

This is a multi-faceted use case and there are a number of fronts on
which the language could be improved to support containers.  Any of
the following would be of some value even without the others:

1) Generic programming to facilitate user-implementation of
   containers that could be used with arbitrary types.  (Ideally,
   community supported packages would then emerge ala C++ STL).

2) Widen the class of overloaded "punctuation" to allow some form of
   paren/bracket notation for specifying container elements on both
   LHS and RGS of assignments.

3) Make specific types containers first class language entities, ala
   existing arrays.

4) Provide an intrinsic COMPARE that arbitrarily (but consistently)
   orders two pointers with the same declared type.