X3J3/96-060

Date:     March 11, 1996
To:       X3J3
From:     William B. Clodius
          44 Los Arboles Dr.
          Los Alamos, NM 87544
Subject:  Comments on sets and their inclusion in Fortran


This note has three parts; first, a discussion of the general requirements
for the inclusion of sets in Fortran; second, a listing of various aspects of
Fortran that could benefit by the inclusion of the concept of sets; finally,
an illustration of a possible syntax as applied to condition handling.

I.  Fortran and sets

Sets are a general form of collection.  Fortran currently utilizes a more
specialized form of collection, arrays.  It is therefore useful to point out
the characteristics of collections, and how the inclusion of sets enhances
the collection capabilities of Fortran.  Much of the following discussion is
based on that of Sipelstein and G.E. Blelloch [1].

Collections may be categorized by three attributes: homogeneity of elements,
structure of elements, and ordering of elements.  Currently Fortran requires
that its collections be homogeneous, as opposed to SETL [2,3] which allows
its collections to be heterogeneous.  Fortran allows its collections to have
both atomic elements, e.g., Fortran's intrinsic types, and structured
elements, e.g., derived types, but not other collections, as opposed to
CM-LISP, which allows not only atomic and structured elements, but also other
collections as elements.  There are at least four ways of ordering
collections: unordered, as in sets; sequential, as in lists and trees;
indexed grids, as in arrays; and key ordered, as in SETL's maps. Fortran
currently explicitly has only indexed grids, while SETL, for example, has the
other three access methods.

The homogeneity requirement on Fortran's collections is useful, as this makes
the code clearer for the reader and writer, and simplifies compiler
optimizations.  However, the meaning of homogeneity could be extended should
the language become object oriented.  The homogeneity requirement means that
Fortran should be able to distinguish between sets of values used to define
types (enumerations in some languages), sets that contain values, and sets
which contain objects (actually references to the objects) that are
associated with values.  In the later case the values should be of the same
type.

Fortran 90/95 allow the mimicking of the characteristics of arrays of arrays,
through arrays of components of a derived type, which in turn contain arrays,
but this requires a significant effort on the programmer's part.  It also
obscures the intent from the reader, and perhaps the compiler.  Current plans
appear to include having the capability of directly defining arrays of
arrays, i.e., one form of collections of collections.  It is suggested that
the syntax be designed to also allow arrays of sets, sets of arrays, and sets
of sets.

The addition of sets to the language requires defining several operations or
functions: operations yielding a set from two compatible sets include union,
intersection, difference, and symmetric difference; operations yielding a
logical value from two compatible sets include intersect (if the sets have
elements in common), includes in various forms (if all elements of the second
set are elements of the first set), equivalent (if the sets contain identical
elements); an operation returning the number of elements of the set; an
operator to determine if an element of a set is contained in a subset of the
set; and assignment associating the elements of one set with another set.

Note although only sets are addressed by this proposal, Namelist, user
defined types, and optional arguments in F90 resemble key-ordered elements,
while keyed files, a proposed addition to F2000, also implements key
ordering.  Consistency suggest that this concept should also be included in
the language.

II.  Applications

There are of course many aspects of Fortran that can benefit by being
described in terms of sets:

1.  Types in general are sets of values.  It has been often been found to be
useful to represent these values as an abstract set.  The Fortran standard
allows this for user defined types with private components, but this requires
the implementer to specify the details of the implementation.  For a finite
set of values, e.g., WEEK = (/ MONDAY, TUESDAY, ..., SUNDAY /), much of this
detail is unnecessary and can be handled by the compiler.  Most languages let
the compiler handle such details for selected types, either in a simple and
error prone manner, e.g., C's enums, or abstractly, e.g., Modula 3's sets
[4].

2. There has recently been a desire for Fortran to implement object oriented
programming facilities.  The concepts of inheritance and polymorphism are
best described in terms of sets of types [5] and the inclusion of a set type
in Fortran would encourage programmers to think in those terms.  This has
been used to good effect in at least one object oriented programming language
designed for use by scientists [6].

3.  By associating KIND values with explicit, but system dependent, integers
instead of with abstract indicators of types that are elements of sets of
compatible types, the principle of encapsulation was violated.  As a result
programmers are tempted to use explicit KIND values in a non-portable way.
 It would be useful for portability (and the inclusion of object oriented
capabilities) to have a general type called KIND, whose values are only used
in assignment to objects of the appropriate KIND type, and in type
parameterization.  Note the the KIND type should also have KIND values that
distinguish each separate group of compatible types selectable with KIND
values, e.g., INTEGER, REAL, ....  This construct should be extendible to
user defined sets of compatible types, with potential application to object
oriented capabilities.  If object oriented programming is introduced into
Fortran, sets of KIND values, instead of individual values, can be used to
specify when polymorphism is to be used, in a manner similar to the subtyping
features of Sather [6].

4.  The range (I:J) and triplet (I:J:K) notation in Fortran represent sets of
integers.  It would be useful in many cases (particularly for repeatedly used
indices but also for parameterized user defined types) if range and triplet
values could be used as arguments or entities.  For Fortran this requires
that they be defined as a type.  Such a capability is already implemented to
a limited extent in languages such as Modula 3 [4].

5.  X3J3 has nominal plans to add interval real arithmetic to the Fortran
standard.  Real intervals in many ways are analogous to integer ranges and
the triplet notation for array indices, in that all involve bounded sets of
numbers.  Consistency suggests that intervals should have a notation and
expression under Fortran's syntax similar to that of integer ranges, but the
current proposal apparently uses a notation similar to that of complex
numbers.

6.  All of the early drafts of the condition handling proposal relied on
explicitly associating conditions with system dependent integers.  This again
violated the principle of encapsulation, and would have encouraged a
non-portable code style.  The latest draft currently relies on a derived type
which is essentially a private integer value.  This reduces the
non-portability of the condition handling code, but still does not allow the
condition handling system capable of handling the case when multiple
conditions are thrown before they are caught.  A recent thread in the
newsgroup comp.lang.c++, indicates that this lack of flexibility has reduced
the robustness of many codes.  This flexibility can be obtained by allowing
the signal when caught to have multiple values, i.e., have handling for a set
of values and not just one value.

7.  There has long been a general desire for the implementation of a bit data
type in Fortran.  Most of the applications suggested for this type appear to
be better addressed with subsets of predefined sets.  The suggestion that a
bit data type be used appears to be prompted by the requesters familiarity
with bit data types in other languages, and a lack of familiarity with sets.
(Note it is possible to implement subsets of predefined sets as bit values.)

8. With its emphasis on arrays it is easy to parallelize Fortran code for
many, but by no means all, problems of scientific interest.  Other less
structured problems are best dealt with in terms of less structured
constructs, e.g., sets, or aggregates.  Languages that include such
constructs, e.g., SETL or Connection Machine LISP, are more readily optimized
for such problems [1]. To some extent, array assignments, and the FORALL and
WHERE constructs allow arrays to be treated similar to sets as the ordering
of arrays need not be explicit, but it is difficult to do this in a
consistent and efficient manner with Fortran's arrays.

III. Illustrative syntax

The following syntax is for illustrative purposes only.  Because I feel that
an immediate application of sets would be in the description of exceptions, I
have chosen this as an example, based on Reid's condition handling proposal,
although signaling of some intrinsic conditions could not be implemented
directly through Fortran.  Possible problems with this syntax includes the
lack of clear syntactical distinction between sets and arrays, and the
overloading of some arithmetic operators.

MODULE CONDITIONS
   IMPLICIT NONE

! The following illustrates the definition of a fixed set of
! named value elements of a given type.
   TYPE(CONDITION), PUBLIC, SET, PARAMETER :: &
      STANDARD_CONDITIONS = &
      (/ ALLOCATION_ERROR, DEALLOCATION_ERROR, &
        INSUFFICIENT_STORAGE, BOUND_ERROR, SHAPE, MANY_ONE, &
        NOT_PRESENT, UNDEFINED, IO_ERROR, END_OF_FILE, &
        END_OF_RECORD, OVERFLOW, INEXACT, UNDERFLOW, &
        DIVIDE_BY_ZERO, INVALID, INTEGER_OVERFLOW, &
        INTEGER_DIVIDE_BY_ZERO /)
! The following defines an open set of elements of a given type
   TYPE(CONDITION), PUBLIC, SET(:) :: SIGNALING_CONDITIONS, &
      SUSPENDED_CONDITIONS
! Note the following creates subsets whose elements are arrays
   TYPE(CONDITION), PUBLIC, SET, PARAMETER :: SYSTEM = &
      (/ SYSTEM_ERROR(1:6) /)
   TYPE(CONDITION), PUBLIC, SET, PARAMETER :: USER = &
      (/ USER_ERROR(1:8) /)

!  The following defines a set that is the union of other sets
   PUBLIC SET TYPE CONDITIONS = STANDARD_CONDITIONS + &
      SYSTEM_ERROR + USER_ERROR(8)

! The following illustrate the definition of unmodifiable
! entities of that are subsets of the set of conditions.
!
   SUBSET(STANDARD_CONDITIONS), PARAMETER, PUBLIC :: STORAGE =  &
      (/ ALLOCATION_ERROR, DEALLOCATION_ERROR, &
      INSUFFICIENT_STORAGE /)
   SUBSET(STANDARD_CONDITIONS), PARAMETER, PUBLIC :: IO = &
      (/ IO_ERROR, END_OF_FILE, END_OF_RECORD /)
   SUBSET(STANDARD_CONDITIONS), PARAMETER, PUBLIC :: FLOATING = &
      (/ OVERFLOW, INVALID, DIVIDE_BY_ZERO /)
   SUBSET(STANDARD_CONDITIONS), PARAMETER, PUBLIC :: INTEGER = &
      (/ INTEGER_OVERFLOW, INTEGER_DIVIDE_BY_ZERO /)

! The following illustrates the definition of a subset as a
! union of other subsets.
   SUBSET(STANDARD_CONDITIONS), PARAMETER :: USUAL = STORAGE + &
      IO + FLOATING + INTEGER
   SUBSET(STANDARD_CONDITIONS), PARAMETER :: ERRORS = USUAL + &
      SYSTEM + USER

! The following defines a subset as the difference between a set
! and its subset.
   SUBSET(CONDITIONS), PARAMETER :: WARNINGS = CONDITIONS - ERRORS

! The following defines an array indexed by a set
   CHARACTER*80, PARAMETER :: STANDARD_MSGS(STANDARD_CONDITIONS) &
      = (/ 'An allocation error was detected ... /)

CONTAINS

   SUBROUTINE SIGNAL(NMEW_SIGNAL)
      TYPE(CONDITION) :: NEW_SIGNAL
      ...
      SIGNALING_CONDITIONS = SIGNALING_CONDITIONS + &
        (/ NEW_SIGNAL /)
      ...
   END SUBROUTINE SIGNAL
   SUBROUTINE SUSPEND_SIGNALS
   ...
   SUSPENDED_SIGNALS = SUSPENDED_SIGNALS + SIGNALING_CONDITIONS
   SIGNALING_CONDITIONS = (/ /)
   ...
   END SUBROUTINE SUSPEND_SIGNALS

END MODULE CONDITIONS

MODULE SYSTEM_X
   IMPLICIT NONE
! The following illustrates how a vendor could provide a module
! describing system dependent conditions.
   USE CONDITIONS :: ONLY SYSTEM => SYSTEM_CONDITIONS
   TYPE(CONDITIONS), PARAMETER, SET :: SYSTEM_X_CONDITIONS = &
      (/ INTERRUPT, SYSTEM_SHUTDOWN, ... /)
   EQUIVALENCE (SYSTEM_X_CONDITIONS, SYSTEM_CONDITIONS)
...
END MODULE SYSTEM_X

References

1. J.M. Sipelstein and G.E. Blelloch, "Collection-Oriented Languages,"
Carnegie Mellon University, School of Computer Science report, CMU-CS-90-127,
March 18, 1991. (I believe a subsequent version of this article was recently
published in an IEEE journal.)

2. J.T. Schwartz. R.B.K. Dewer, E. Dubinsky, and E. Schonberg. "Programming
with Sets: An Introduction to SETL." Springer-Verlag, New York, 1986.

3.  W. Kirk Snyder, "The SETL2 Programming Language," Technical Report 490,
Courant Institute of Mathematical Sciences, New York University, September 9,
1990.

4. Samuel P. Harbison, "Modula-3." pp. 92-97, Prentice-Hall, Engelwood
Cliffs, New Jersey, 1992.

5. Bertrand Meyer, "Eiffel: the language," pp. 324-324, Prentice-Hall, New
York, 1992.

6.  Steve Omohundro and David Stoutamire, "The Sather 1.0 Specification,"
International Computer Science Institute, 1947 Center Street, Suite 600,
Berkeley, California 94704, December 2, 1994.