J3/98-217


Subject: Comments on OOP in Fortran 2000
Author:  Werner Schulz
Date:    25 October 1998

I have criticized the current plans for object-oriented programming in
Fortran
2000 at various occasions, last in an email to comp-fortran-90 on 10 Aug
1998.
Unfortunately, hardly any of my criticisms are reflected in the recent F2000
draft (98-007r3).  Several serious problems are still present and must be
resolved before the current drafting process by J3 comes to its end.

I will mix criticism of the draft with alternative proposals and remarks.  I
want to emphasize that my proposals allow for better compile-time checks
while
removing all run-time errors that mar the current F2000 draft (and other OO
languages).  But first a

SUMMARY:
--------

I present constructive solutions to various and serious shortcomings of the
current F2000 draft in the area of object-oriented programming:

-The proposed syntax is simpler, more concise and consistent, the rules and
constraints are easier to understand and more consistent than the F2000
draft.

-Certain problems (some future) are avoided by using CLASS instead of TYPE
(the sub-type vs. sub-TYPE problem) while REF replaces the current CLASS for
polymorphic objects.

-A SELF construct is introduced that avoids the PASS_OBJ attribute and
non-pointer polymorphic objects.

-ABSTRACT and FROZEN classes are defined in a consistent manner. A potential
source for run-time errors is avoided. The inconsistent use of NULL() in
deferring type-bound procedures is eliminated.

-Polymorphic objects are clearly defined and simple rules are established
including arrays of such objects and potential sources for run-time errors
are
avoided.

-A special but important and completely safe version of covariance is added.

-A proposal for a more regular syntax of procedure declaration  is given.

-Some related issues are touched.

(END OF SUMMARY)

This paper consists of these main sections:

  a) The Object View: some simple thoughts on a basic syntax
  b) Inheritance:     mostly abstract classes and new syntax for procedures
  c) Polymorphism:    Do's and Dont's for polymorphic objects
  d) Related Issues:  some important leftovers

The Object View:
----------------

While the TYPE construct has been a useful one it addresses only part of the
needs. Classes in OOP offer a better approach. A class consists of variables
(the 'state') and procedures (the 'behaviour'); objects are instances of
classes. A simple example follows:

   class :: Point
       self :: me
       real :: x, y
       function get_x()
           get_x = me%x
       end function get_x
       function length()
           length = sqrt( me%x**2 +me%y**2 )
       end function length
       subroutine move( d )
           class(Point), intent(in) :: d
           me%x = me%x +d%x
           me%y = me%y +d%y
       end subroutine move
   end class Point

   Usage:

   class(Point) :: A = Point(x=1.0,y=1.0)  ! A is an instance of class Point
   class(Point) :: B = Point(x=2.0,y=1.0)
   write(*,*) A%length()                   ! even better would be A%length
   call A%move( B )

This example is self-explanatory. A couple of points are worth noting
though:

The code is very concise and easy to read; it can hardly be shorter in
Fortran.

The SELF declaration provides a useful and necessary means to refer to the
whole object and is very natural; 'me' is not a class component.  A SELF
construct is found in many OO languages (Ada95 a rather unconvincing
exception), usually as a reserved word. This is not possible in Fortran,
hence
the above version.  SELF is independent of inheritance and polymorphism; a
point completely missed in the F2000 draft.

Unlike the F2000 draft variables and procedures are treated on the same
footing.  They are physically and syntactically enclosed in the class scope;
the dummy argument list of class procedures corresponds exactly to the
actual
argument list since there is no need for a PASS_OBJ attribute.
Compare the above example with a version written in the current draft
syntax:

MODULE POINT_MODULE

    TYPE, EXTENSIBLE :: POINT
       REAL :: x, y
    CONTAINS
       PROCEDURE, PASS_OBJ :: LENGTH => LEN_2D
       ! etc.
    END TYPE POINT

    PRIVATE :: LEN_2D

    FUNCTION LEN_2D( P )
        CLASS(POINT) :: P
        LEN_2D = SQRT( P%x**2 +P%y**2 )
    END FUNCTION LEN_2D

END MODULE POINT_MODULE

Here, LEN_2D is a private module procedure bound to POINT via => (how many
users will forget to make LEN_2D private?). This all smells very strongly of
a
Fortran-to-C translation rather than good Fortran design.  The F2000 draft
is
rather lengthy, complicated, opaque (is PROCEDURE :: LENGTH a function or a
subroutine?) and convoluted with unnecessary flexibility. The line
containing
PROCEDURE mangles a type-bound procedure with a module procedure and rename
it with a strong hint of procedure pointers all in one line when only a
simple
class procedure is needed.

A class is a compilation unit and, as should have become clear by now,
combines
certain characteristics of MODULE and TYPE. Hence the class syntax also
offers
all of the benefits of the two. Classes can be declared inside MODULEs with
the
usual rules of access etc.

Classes can have attributes similar to those of the other Fortran types,
e.g.
POINTER, PUBLIC, PRIVATE, TARGET, DIMENSION, SAVE, INTENT(...).  Two
additional
attributes will be added later.

I prefer the word CLASS over TYPE for two reasons:
a) class is the widely accepted and copied notion in the dominant OO
languages
and in the literature on the subject. Why deviate from it? It just leads to
unnecessary confusion among users.

b) The term type has become widely used to describe the public interface of
classes. It is easy to demonstrate that with A a class and B a sub-class of
A
and AT and BT the respective types, then BT is not necessarily a sub-type of
AT.  The F2000 draft completely ignores this important subtlety. (Try and
replace class with TYPE in the above sentence.)

A slight change to one class procedure to introduce a new construct:

       subroutine move( d )
           like(me), intent(in) :: d
           me%x = me%x +d%x
           me%y = me%y +d%y
       end subroutine move

The construct LIKE(me) refers to the class of the invoking object.  This
construct is the only uncontroversial and safe form of covariance and solves
the important special case of so-called 'binary methods'.

Inheritance:
------------

A child class 'inherits' the variables and procedures of the parent class
and
modifies it in some way or adds new ones or both:

   class :: Point3D
       self    :: me

       inherit :: Point    ! analogous to use A_Module
           redefine :: length, move
       end inherit Point

       real    :: z        ! new variable

       function length()
           length = sqrt( x**2 +y**2 +z**2 )   ! me can be omitted
       end function length
       subroutine move( d )
           like(me), intent(in) :: d
           me%x = me%x +d%x
           me%y = me%y +d%y
           me%z = me%z +d%z
       end subroutine move
   end class Point

   Usage:
   class(Point3D) :: E = Point3D(x=1.0,y=1.0,z=0.0)
   class(Point3D) :: F = Point3D(x=1.0,y=2.0,z=0.0)
   class(Point)   :: A = Point3D(x=1.0,y=1.0)

   write(*,*) E%length(), F%length(), A%length()
   call E%move( F )
   !call E%move( A ) ! ILLEGAL

The self-referential constructs change their meaning when inherited. In
GET_X
of Point me is a Point, while in Point3D's inherited GET_X me refers to a
3-dimensional point.

Remark: This sounds like some form of polymorphism and compiler writers can
exploit this fact but this should not be used to mangle different concepts
in
the language itself. The statement just states the rules needed for
type-checking
of expressions at compile-time. The F2000 draft confuses this simple issue.

F2000 talks of 'type extension' and uses two attributes, 'extensible' and
'extends'. Why extension instead of inheritance? My proposed form is also
easier to adapt to multiple inheritance (a la Eiffel, the only language that
does this right and it's not that complicated), should the need arise later.
A
base class is simply a class without an INHERIT declaration.

The rules for modifying inherited procedures are that the interface has to
be
the same, i.e. non-variant. Covariance is a difficult topic and should be
avoided until a later stage when on-going research has solved the issue.
However, the like(me) construct is a completely safe form of covariance.

Abstract and frozen classes:

It is sometimes useful to declare an abstract class, i.e. a class that
cannot
have an instance. It is left to a child class to turn everything concrete.
The
F2000 draft has some serious problems to implement this. I suggest to do the
following:

       class, abstract :: airplane
            ! some variable declarations
            subroutine, abstract :: open_landing_gear( )
                ! argument declarations but no body
            end subroutine open_landing_gear
       end class airplane

Rules: An class is abstract if it has the abstract attribute (and no better
way
of doing so); a class must be abstract if any of its procedures, inherited
or not,
is abstract. An abstract class procedure must declare its interface.  The
above
example is easy since it has no argument list.  A concrete procedure cannot
be
redefined into an abstract one during inheritance since this would clash
with
polymorphism. This is not properly solved in the F2000 draft (a constraint
is
missing in R439).

There is no need for strange constructs like NULL(PROCEDURE_NAME) as in R439
of
98-007R3 (see also internal note in section 4.5.1.5 of the draft).

Frozen (final? but not NON_OVERRIDABLE) classes are classes that cannot be
inherited; abstract and frozen are mutually exclusive.  The intrinsic
Fortran
types can be considered frozen classes. (This would tie everything quite
neatly
together except for the need to write class(class_name) in declarations.)

My proposal so far is easy and simple. However it requires a new piece of
syntax which in my view has been long overdue (see 98-143). I have taken new
courage to propose this since the draft has opened the door towards such a
syntax addition (see the chapter on IEEE exception handling with USE,
INTRINSIC
:: module_name and the syntax in R437 (PROCEDURE, proc_attribute ::
proc_name).
It is hard to understand why this syntax should not be applied uniformly and
generally to all relevant Fortran constructs, for example:

    MODULE B
       USE, PRIVATE :: A
       USE, PUBLIC  :: A, ONLY: A_VAR, A_PROC
    END MODULE B

with rather obvious semantics.

The most pressing need for this syntax is in procedures declarations (see
98-143).  Slightly updated examples are:

    FUNCTION,   OPERATOR(+),     PUBLIC :: ADD_A_TO_B
    SUBROUTINE, ASSIGNMENT(=),   PUBLIC :: ASSIGN_F_TO_E
    SUBROUTINE, GENERIC(DELETE), PUBLIC :: delete_mytype
    SUBROUTINE, GENERIC(DELETE), PUBLIC :: delete_histype

Actually one can do away with FUNCTION/SUBROUTINE in the first two lines.
The
PUBLIC attribute only applies to the OPERATOR/ASSIGNMENT/GENERIC part while
the
specific name is always PRIVATE. (This is a very reasonable interpretation,
why
declare an operator etc. at all if this wouldn't be the case. And it
simplifies
three INTERFACE variations.)

Remark: A special rule for classes should disallow attribute statements; if
not
the awkward CONTAINS statement must be added to the class syntax.

Polymorphism:
-------------

Polymorphism is in some ways the opposite to inheritance in the following
sense: Inheritance usually implies specialization by adding or modifying the
behaviour of the parent class. Polymorphism turns this around and allows to
refer to objects of the same subtree, i.e. it addresses the commonality of
these objects. This immediately suggests certain constraints, viz. only the
common aspects can be used, typically only those defined in the class at the
root of the subtree.

Polymorphic objects, as opposed to monomorphic objects, can only be added
safely and consistently to the language as some kind of pointer though the
details of the compiler implementation is of no interest here.  I propose:

     ref(base_class) :: a_poly_object_name

The potential attribute POINTER is totally superfluous so I leave it out.
Polymorphic objects are only possible in this form. Unlike the F2000 draft
there aren't any non-pointer polymorphic objects here which are only needed
in
the F2000 draft since it refuses to use a 'self' construct for whatever
reasons
and in spite of the fact that non-pointer polymorphic objects can lead to
run-time errors.  I suspect such run-time errors to be quite frustrating to
most Fortran users. This problem is avoided here.

So what is allowed with polymorphic objects?

One can only use those variables and procedures (incl. abstract ones) that
are
defined in the base class and are accessible EXCEPT those procedures that
have
an argument declared with 'LIKE(self_name)' (See earlier remark!).

Only pointer-assignment is allowed since one cannot assure the correctness
of
the ordinary assignment at run-time:

   ref(Point)     :: A
   ref(Point3D)   :: B
   class(Point)   :: P
   class(Point3D) :: Q

   A => P  ! ok
   A => Q  ! ok
   B => Q  ! ok
   A => B  ! ok

   B => P  ! illegal, wrong base type
   B => A  ! illegal, wrong base type
   A =  P  ! illegal
   A =  B  ! illegal

   write(*,*) A%length()  ! ok if A is associated
   call A%move( P )       ! illegal since move has a like(me) argument

All these rules can be checked at compile-time, hence run-time errors based
on
type mismatches cannot occur. This should allow for better optimization of
the
Fortran code.

ref(Abstract_Class) is allowed since the target itself must ultimately be a
monomorphic object (named or unnamed) or NULL.

Remark: These rules seem very restrictive but that is a direct consequence
of
the nature of polymorphic objects. Other OO languages like Java seem to be
less
restrictive but one shouldn't forget that in Java all instances of
user-defined
classes are polymorphic from the outset and Java's assignment(=) corresponds
to
Fortran's pointer assignment (=>). There is no equivalent to Fortran's
monomorphic classes (except for the primitive ones like integer, float).

There is no language-defined casting of objects from a subclass to a
ancestral
class (i.e.  no conversion of Point3D objects to Point objects).  Languages
with such constructs are potentially very dangerous and are difficult to
debug
(see C++). If any such facility is needed users should define them.

How should one interpret this: ref(Point), dimension(10) :: X ?

X is an array of ref(Point), each element can contain a Point or a Point3D
or
..., in other words X is a heterogenous array. This is unlike the F2000
draft
with its homogeneous version which again has potential run-time errors due
to
type errors (or has that part been dropped by now?).  Arrays of polymorphic
objects just cannot be made as efficient as arrays of monomorphic objects.

Remark: The section on Arrays (2.4.5) may need some clarification in the
light
of polymorphism. What exactly does it means to talk of the same type?  Is it
the compile-time type or the run-time type? Etc. etc. What is a sub-type
(and
what a sub-TYPE)? Again the decision by J3 to opt for TYPE-extension instead
of
class and inheritance is very dubious.

In addition I would like to see a form of allocation that allows to point a
polymorphic object to an unnamed monomorphic object as is done in Java's new
construct:

     ref(Point) :: some_point
     some_point => ALLOCATE( Point3D(x=0.0,y=0.0,z=0.0) )

(This code example is awkward here since Fortran's ALLOCATE is not a
function.)

Related Issues:
---------------

1. USE statement inside Classes
This is a problematic area that would need some discussion and where I
personally haven't made up my mind.  The main problem for me is whether such
a
statement would be inherited or not.

2. A new access attribute
READONLY is the most appropriate default access attribute for class
variables.
READONLY would remove the need for the typical get procedures so ubiquitous
in
C++ and other OO languages. The set procedures are essential. The emphasis
is
here on default and now, any later addition of READONLY is already
diminishing
its usefulness since it could not be made the default. (See B Meyer's
Object-oriented Software Construction, 1997, for more and better arguments.)

3. Templates or generic classes
This issue has come up several times in discussion on the Fortran mail/news
groups. Just saying there is not enough time is just not good enough.  This
is
not J3's blame but one can only wonder what could have been done by skipping
the interval arithmetic item instead.

It is also a bit disingenuous since the Fortran committees found time to add
KIND and NONKIND parameters to derived types and a non-executable SELECT
CASE
construct to deal with KIND parameters.  Both constructs would be completely
unnecessary if truly generic procedures were available (just an attempt):

   subroutine swap( a, b )
       type, generic :: T
       T :: a, b, temp
       temp = a ; a    = b ; b    = temp
   end subroutine swap

The constructs in the draft are just fiddling around with the symptoms and
would be better left out.

Remark: One should consider seriously to replace the DIMENSION attribute
with a
generic class ARRAY as the most important form of a container (others are
queues, stacks, trees, lists, dictionaries, ...).  Very little needs
changing
since Fortran's arrays capabilities are already much better than virtually
all
competitors. It is just a shift of emphasis and towards a more regular
syntax.

And POINTER, too, could become a generic class instead of being an
attribute.
This would allow things like this:

   pointer( ARRAY(REAL,rank=2 )     :: a  ! pointer to a 2-dim array
   array( POINTER(REAL), (/0:4/) )  :: b  ! array of pointers to real
numbers

This is certainly more elegant than the current solution in Fortran 90.

3. Super It seems that the draft's equivalent of SUPER is better and more
flexible than that of the common SUPER construct in other OO languages. I
would
only like to warn against too much freedom here. It is, for example, a bad
idea
to allow to target components of an object from the outside in any way.

4. Miscellanea

There are a number of other issues that should be addressed ASAP.  Multiple
inheritance, object persistence and I/O, CORBA, a Fortran STL, support for
code
documentation, contract-by-design (assertions), general exception handling
(not
just IEEE), better memory handling support (garbage collection), ...
But we better leave some work for the standard after F2000.