X3J3/96-060 Date: March 11, 1996 To: X3J3 From: William B. Clodius 44 Los Arboles Dr. Los Alamos, NM 87544 Subject: Comments on sets and their inclusion in Fortran This note has three parts; first, a discussion of the general requirements for the inclusion of sets in Fortran; second, a listing of various aspects of Fortran that could benefit by the inclusion of the concept of sets; finally, an illustration of a possible syntax as applied to condition handling. I. Fortran and sets Sets are a general form of collection. Fortran currently utilizes a more specialized form of collection, arrays. It is therefore useful to point out the characteristics of collections, and how the inclusion of sets enhances the collection capabilities of Fortran. Much of the following discussion is based on that of Sipelstein and G.E. Blelloch [1]. Collections may be categorized by three attributes: homogeneity of elements, structure of elements, and ordering of elements. Currently Fortran requires that its collections be homogeneous, as opposed to SETL [2,3] which allows its collections to be heterogeneous. Fortran allows its collections to have both atomic elements, e.g., Fortran's intrinsic types, and structured elements, e.g., derived types, but not other collections, as opposed to CM-LISP, which allows not only atomic and structured elements, but also other collections as elements. There are at least four ways of ordering collections: unordered, as in sets; sequential, as in lists and trees; indexed grids, as in arrays; and key ordered, as in SETL's maps. Fortran currently explicitly has only indexed grids, while SETL, for example, has the other three access methods. The homogeneity requirement on Fortran's collections is useful, as this makes the code clearer for the reader and writer, and simplifies compiler optimizations. However, the meaning of homogeneity could be extended should the language become object oriented. The homogeneity requirement means that Fortran should be able to distinguish between sets of values used to define types (enumerations in some languages), sets that contain values, and sets which contain objects (actually references to the objects) that are associated with values. In the later case the values should be of the same type. Fortran 90/95 allow the mimicking of the characteristics of arrays of arrays, through arrays of components of a derived type, which in turn contain arrays, but this requires a significant effort on the programmer's part. It also obscures the intent from the reader, and perhaps the compiler. Current plans appear to include having the capability of directly defining arrays of arrays, i.e., one form of collections of collections. It is suggested that the syntax be designed to also allow arrays of sets, sets of arrays, and sets of sets. The addition of sets to the language requires defining several operations or functions: operations yielding a set from two compatible sets include union, intersection, difference, and symmetric difference; operations yielding a logical value from two compatible sets include intersect (if the sets have elements in common), includes in various forms (if all elements of the second set are elements of the first set), equivalent (if the sets contain identical elements); an operation returning the number of elements of the set; an operator to determine if an element of a set is contained in a subset of the set; and assignment associating the elements of one set with another set. Note although only sets are addressed by this proposal, Namelist, user defined types, and optional arguments in F90 resemble key-ordered elements, while keyed files, a proposed addition to F2000, also implements key ordering. Consistency suggest that this concept should also be included in the language. II. Applications There are of course many aspects of Fortran that can benefit by being described in terms of sets: 1. Types in general are sets of values. It has been often been found to be useful to represent these values as an abstract set. The Fortran standard allows this for user defined types with private components, but this requires the implementer to specify the details of the implementation. For a finite set of values, e.g., WEEK = (/ MONDAY, TUESDAY, ..., SUNDAY /), much of this detail is unnecessary and can be handled by the compiler. Most languages let the compiler handle such details for selected types, either in a simple and error prone manner, e.g., C's enums, or abstractly, e.g., Modula 3's sets [4]. 2. There has recently been a desire for Fortran to implement object oriented programming facilities. The concepts of inheritance and polymorphism are best described in terms of sets of types [5] and the inclusion of a set type in Fortran would encourage programmers to think in those terms. This has been used to good effect in at least one object oriented programming language designed for use by scientists [6]. 3. By associating KIND values with explicit, but system dependent, integers instead of with abstract indicators of types that are elements of sets of compatible types, the principle of encapsulation was violated. As a result programmers are tempted to use explicit KIND values in a non-portable way. It would be useful for portability (and the inclusion of object oriented capabilities) to have a general type called KIND, whose values are only used in assignment to objects of the appropriate KIND type, and in type parameterization. Note the the KIND type should also have KIND values that distinguish each separate group of compatible types selectable with KIND values, e.g., INTEGER, REAL, .... This construct should be extendible to user defined sets of compatible types, with potential application to object oriented capabilities. If object oriented programming is introduced into Fortran, sets of KIND values, instead of individual values, can be used to specify when polymorphism is to be used, in a manner similar to the subtyping features of Sather [6]. 4. The range (I:J) and triplet (I:J:K) notation in Fortran represent sets of integers. It would be useful in many cases (particularly for repeatedly used indices but also for parameterized user defined types) if range and triplet values could be used as arguments or entities. For Fortran this requires that they be defined as a type. Such a capability is already implemented to a limited extent in languages such as Modula 3 [4]. 5. X3J3 has nominal plans to add interval real arithmetic to the Fortran standard. Real intervals in many ways are analogous to integer ranges and the triplet notation for array indices, in that all involve bounded sets of numbers. Consistency suggests that intervals should have a notation and expression under Fortran's syntax similar to that of integer ranges, but the current proposal apparently uses a notation similar to that of complex numbers. 6. All of the early drafts of the condition handling proposal relied on explicitly associating conditions with system dependent integers. This again violated the principle of encapsulation, and would have encouraged a non-portable code style. The latest draft currently relies on a derived type which is essentially a private integer value. This reduces the non-portability of the condition handling code, but still does not allow the condition handling system capable of handling the case when multiple conditions are thrown before they are caught. A recent thread in the newsgroup comp.lang.c++, indicates that this lack of flexibility has reduced the robustness of many codes. This flexibility can be obtained by allowing the signal when caught to have multiple values, i.e., have handling for a set of values and not just one value. 7. There has long been a general desire for the implementation of a bit data type in Fortran. Most of the applications suggested for this type appear to be better addressed with subsets of predefined sets. The suggestion that a bit data type be used appears to be prompted by the requesters familiarity with bit data types in other languages, and a lack of familiarity with sets. (Note it is possible to implement subsets of predefined sets as bit values.) 8. With its emphasis on arrays it is easy to parallelize Fortran code for many, but by no means all, problems of scientific interest. Other less structured problems are best dealt with in terms of less structured constructs, e.g., sets, or aggregates. Languages that include such constructs, e.g., SETL or Connection Machine LISP, are more readily optimized for such problems [1]. To some extent, array assignments, and the FORALL and WHERE constructs allow arrays to be treated similar to sets as the ordering of arrays need not be explicit, but it is difficult to do this in a consistent and efficient manner with Fortran's arrays. III. Illustrative syntax The following syntax is for illustrative purposes only. Because I feel that an immediate application of sets would be in the description of exceptions, I have chosen this as an example, based on Reid's condition handling proposal, although signaling of some intrinsic conditions could not be implemented directly through Fortran. Possible problems with this syntax includes the lack of clear syntactical distinction between sets and arrays, and the overloading of some arithmetic operators. MODULE CONDITIONS IMPLICIT NONE ! The following illustrates the definition of a fixed set of ! named value elements of a given type. TYPE(CONDITION), PUBLIC, SET, PARAMETER :: & STANDARD_CONDITIONS = & (/ ALLOCATION_ERROR, DEALLOCATION_ERROR, & INSUFFICIENT_STORAGE, BOUND_ERROR, SHAPE, MANY_ONE, & NOT_PRESENT, UNDEFINED, IO_ERROR, END_OF_FILE, & END_OF_RECORD, OVERFLOW, INEXACT, UNDERFLOW, & DIVIDE_BY_ZERO, INVALID, INTEGER_OVERFLOW, & INTEGER_DIVIDE_BY_ZERO /) ! The following defines an open set of elements of a given type TYPE(CONDITION), PUBLIC, SET(:) :: SIGNALING_CONDITIONS, & SUSPENDED_CONDITIONS ! Note the following creates subsets whose elements are arrays TYPE(CONDITION), PUBLIC, SET, PARAMETER :: SYSTEM = & (/ SYSTEM_ERROR(1:6) /) TYPE(CONDITION), PUBLIC, SET, PARAMETER :: USER = & (/ USER_ERROR(1:8) /) ! The following defines a set that is the union of other sets PUBLIC SET TYPE CONDITIONS = STANDARD_CONDITIONS + & SYSTEM_ERROR + USER_ERROR(8) ! The following illustrate the definition of unmodifiable ! entities of that are subsets of the set of conditions. ! SUBSET(STANDARD_CONDITIONS), PARAMETER, PUBLIC :: STORAGE = & (/ ALLOCATION_ERROR, DEALLOCATION_ERROR, & INSUFFICIENT_STORAGE /) SUBSET(STANDARD_CONDITIONS), PARAMETER, PUBLIC :: IO = & (/ IO_ERROR, END_OF_FILE, END_OF_RECORD /) SUBSET(STANDARD_CONDITIONS), PARAMETER, PUBLIC :: FLOATING = & (/ OVERFLOW, INVALID, DIVIDE_BY_ZERO /) SUBSET(STANDARD_CONDITIONS), PARAMETER, PUBLIC :: INTEGER = & (/ INTEGER_OVERFLOW, INTEGER_DIVIDE_BY_ZERO /) ! The following illustrates the definition of a subset as a ! union of other subsets. SUBSET(STANDARD_CONDITIONS), PARAMETER :: USUAL = STORAGE + & IO + FLOATING + INTEGER SUBSET(STANDARD_CONDITIONS), PARAMETER :: ERRORS = USUAL + & SYSTEM + USER ! The following defines a subset as the difference between a set ! and its subset. SUBSET(CONDITIONS), PARAMETER :: WARNINGS = CONDITIONS - ERRORS ! The following defines an array indexed by a set CHARACTER*80, PARAMETER :: STANDARD_MSGS(STANDARD_CONDITIONS) & = (/ 'An allocation error was detected ... /) CONTAINS SUBROUTINE SIGNAL(NMEW_SIGNAL) TYPE(CONDITION) :: NEW_SIGNAL ... SIGNALING_CONDITIONS = SIGNALING_CONDITIONS + & (/ NEW_SIGNAL /) ... END SUBROUTINE SIGNAL SUBROUTINE SUSPEND_SIGNALS ... SUSPENDED_SIGNALS = SUSPENDED_SIGNALS + SIGNALING_CONDITIONS SIGNALING_CONDITIONS = (/ /) ... END SUBROUTINE SUSPEND_SIGNALS END MODULE CONDITIONS MODULE SYSTEM_X IMPLICIT NONE ! The following illustrates how a vendor could provide a module ! describing system dependent conditions. USE CONDITIONS :: ONLY SYSTEM => SYSTEM_CONDITIONS TYPE(CONDITIONS), PARAMETER, SET :: SYSTEM_X_CONDITIONS = & (/ INTERRUPT, SYSTEM_SHUTDOWN, ... /) EQUIVALENCE (SYSTEM_X_CONDITIONS, SYSTEM_CONDITIONS) ... END MODULE SYSTEM_X References 1. J.M. Sipelstein and G.E. Blelloch, "Collection-Oriented Languages," Carnegie Mellon University, School of Computer Science report, CMU-CS-90-127, March 18, 1991. (I believe a subsequent version of this article was recently published in an IEEE journal.) 2. J.T. Schwartz. R.B.K. Dewer, E. Dubinsky, and E. Schonberg. "Programming with Sets: An Introduction to SETL." Springer-Verlag, New York, 1986. 3. W. Kirk Snyder, "The SETL2 Programming Language," Technical Report 490, Courant Institute of Mathematical Sciences, New York University, September 9, 1990. 4. Samuel P. Harbison, "Modula-3." pp. 92-97, Prentice-Hall, Engelwood Cliffs, New Jersey, 1992. 5. Bertrand Meyer, "Eiffel: the language," pp. 324-324, Prentice-Hall, New York, 1992. 6. Steve Omohundro and David Stoutamire, "The Sather 1.0 Specification," International Computer Science Institute, 1947 Center Street, Suite 600, Berkeley, California 94704, December 2, 1994.