J3/98-113

Date:     30 Jan 1998
To:       J3
From:     Dick Hendrickson
Subject:  4f:  Constants for Opaque Data Types

This is a discussion paper and Specs for item 4f, Constants for Opaque Data
Types, for Interval Arithmetic Enabling Technologies.  Assuming there is
general agreement, I think this is detailed enough to also be the syntax
document.

The driving force behind this is to provide a way to specify constants for
interval arithmetic that would be useful even IF interval arithmetic were
implemented as a user written module.  The problem is that Fortran regards
a literal constant, like 3.14, as a literal constant and it is converted to
an internal floating point representation before it gets special cased as
an opaque constant.  This is one of the reasons why something simple like

interval_constructor_function(3.14)

won't work.  Unless the function is truly intrinsic the compiler will
convert 3.14 to binary and pass that to the function: all accuracy is lost.
 Also, creating a magic constructor for interval constants doesn't help
anyone who wants to invent their own derived type and have constants for
it.  Similarly, inventing a new operator pair, like [3.14] doesn't help.
We could make the [...] combination be magic for intervals, but what will a
user do?  It's really no different from a function reference.

OVERLOADED CONSTRUCTOR FUNCTIONS

I think the solution is to build on the already existing derived type
constructor functions by allowing overloads.  It's up to the module writer
to provide as many options as necessary, subject to the normal Ch 14 rules
that they must be unique at compile time.  We probably want an additional
rule saying an overload can't exactly match the normal derived type
constructor function, although that wouldn't be necessary for opaque types.
 So, the interval arithmetic module would probably define overloaded
constructor functions which take one or two character strings as arguments
and returns an interval.  Something like

interval("3.14") or
interval("3.135, 3.145") or
interval("3.14", ".00001")

For interval arithmetic it's natural to use character strings as the
arguments to avoid having the automatic conversion to internal floating
point before the function invocation.  If a person forgot the quote marks
he'd get a compile time error for a missing interface (unless the interval
module definers choose to define an overload for ordinary reals as well).
To know exactly what the different forms mean a user would have to read the
interval module documentation, but that's already true of many existing
non-interval-constructor functions.  Presumably the interval module
designers will make the constructor interface independent of the details of
the opaque type.

This is really just the normal function notation for a function with some
arguments that returns a result of user defined type.  This adds a little
syntactic sugar to make it more obvious what is going on and to make it
possible to construct entities with opaque insides without having to know
the representation.

MAGIC CHARACTER TO DERIVED TYPE CONVERSION

That's a pretty verbose notation for a constant and since the "..." is
likely to be a useful way to send information into an opaque type (for
example intervals) I propose that we also "overload" the conversion between
characters and other types.

The logic when the compiler sees an operation on a character string is:

1)  is it an intrinsic operation?  If so, do it

2)  is it a user defined operation in the F95 sense?  If so, do it

3)  is there an accessible overload to a constructor function that will
unambiguously turn the character string into the needed type?  If so, do
that and then perform the operation using the result.

4)  issue an error message or start WWIII or whatever

In effect, the compiler uses a constructor function to convert the
character string to the type of the other operand.

If X is a type(interval) variable then

X = "3.14"
is equivalent to
X = interval("3.14")

and

print *, X+"3.14"
is equivalent to
print *, X+interval("3.14")

assuming the interval module defines an overload to convert character
strings into entities of type(interval).

Something like

call a_routine_that_expects_an_interval_argument("3.14")

could, in principle, be defined to automatically invoke the constructor IF
the compiler could see an unambiguous interface and know that the routine
wanted an interval argument.  Since we don't do this with calls in general
I wouldn't do it here either.

Things like
"3.14" + 2.71
or
"3.14" + "2.71"
will NOT invoke the character to interval constructor because there is
nothing in the expressions to imply invervalness.  Someone might choose to
define overloads for C+R or C+C, but that's the normal Fortran 95 operator
overloading process.

I don't think this causes parsing problems even if there are variables with
different types involved.  Suppose X is of type(interval) and Y is of
type(complex_interval) then I think

X + "3.14" + Y

is unambiguous by the current chapter 7 rules.  It's equivalent to

(X + interval("3.14)) + Y

just as

X + 3.14 + Y
is
(X + 3.14) + Y    when the types are user defined.

People who write code which has multiple opaque derived types might have a
confusing time sorting out the different meanings of the strings; they
should use the named constructor functions.

OPAQUE CONSTANTS AS INITIALIZERS

To be useful we also have to allow constructor functions as initialization
expressions in type declaration statements.

For initialized entities I think the only two additional rules we need are
1) the user defined interval constructor function must meet the rules for
specification functions (96:13 in F95) and have initialization expressions
as arguments, and 2) these entities can appear in other initializers, but
not in specification expressions.

Taken together, these 2 rules allow (but don't require) the compiler to
defer evaluation of the initialization expression until run-time.  The
first rule allows the compiler to evaluate the "parameters" in any order
and to evaluate them every time a subroutine is entered.  The second rule
means they can't be used for things that must be known at compile-time,
such as type parameters or array bounds.  The latter is maybe a little
restrictive and we could relax this somewhat.

In a procedure a compiler would be free to evaluate the constructors at
compile time (perhaps by examining the constructor function), to evaluate
them every time the procedure is entered, or to evaluate them on the first
entry and save the results.

For parameters and initialized variables above a contains statement in a
module the compiler will probably have to come up with a new strategy if it
defers initialization until runtime.  One possibility is for the "linker"
to build a start-up procedure that computes and assigns the values before
the real program starts.  Another is for the compiler to create hidden
accessor functions and reference the values with a function call.

This can be used with explicit constructor functions and also with the
magic character form.

type (interval), parameter  ::  pi = "3.14"
type (person)  ::  Al = person("married, with children")

Currently initialization expressions are limited to intrinsic operations.
For derived types that are pretty much numeric it would be nice to be able
to initialize related variables, things like:

type (interval), parameter  ::  twopi = 2*pi

I think that is too much to put in at this time and am not proposing this.
Besides, it is sort of available with the constructor model.  There's no
reason why the module designer can't provide an overload that takes two
operands (as character strings) and a character string operator.  Something
like:

type (interval), parameter  ::  twopi = interval("2", "*", "3.14")

Summary and proposal:

1)  Allow overloading of the derived type constructor function with
CONSTRUCTOR (type-name) as the generic-spec on the interface statement.
The overload can't exactly match the existing constructor function unless
the type is opaque.

2)  Allow magic invocation of a CONSTRUCTOR function when the compiler sees
an operation between a derived type and a character string that isn't a
defined operation if there is an appropriate CONSTRUCTOR available.

3)  Allow form 1) above in initialization expressions with the appropriate
rules

4)  Also allow form 2) in initialization expressions

potential straw votes

Broaden 2) to allow character variables.

Allow derived type parameters (and constructors) in dimension bound
expressions.  (the likely implementation will be similar to automatic arrays)