J3/98-118 Date: 12th February 1998 To: J3 From: Malcolm Cohen Subject: Polymorphism Design Decisions This paper provides rationale for some of the design decisions reflected in 97-230r1 (Polymorphism Specification and Illustrative Syntax), with some discussion on various alternatives. It represents the views of myself only. 1. Nomenclature There are two classes of procedure reference to be provided in Fortran 2002 that will be dynamically dispatched; the first will be provided by the "procedure pointer" feature, the second by the "polymorphism" (sic) feature. 1.1 Classification by binding One can classify dynamically dispatched procedure references by the method used to determine which procedure is called. "value-bound" -- the value of an entity determines the procedure invoked. This applies to dummy procedures and procedure pointers, which one might consider to have a "procedure value". "object-bound" -- these are similar to "value-bound"; the difference is that in an object-oriented context: - the "procedure value" is part of a containing object - the procedure invoked has access to its object, that is, the object it is contained within. [NB: In 97-230r1 and discussions I previously used "object-bound" as a synonym for "value-bound". This was a little sloppy.] "type-bound" -- the type of an entity determines the procedure invoked. Dynamic dispatch only occurs when entities can have runtime types that are not statically determined (and we have this through polymorphic variables with type extension). In an object-oriented context: - the "procedure value" is not part of an object, but a characteristic of the type of a specific object - the procedure invoked has access to this specific object. 2. Goals "Dynamic dispatch, where the procedure invoked depends on the runtime type of a single object." Sub-goals: - simple to implement - simple to use - allows object-oriented programming - efficient to use Single dispatch is the simplest form of dynamic dispatch. 3. Declaration Characteristics The proposal has the type-bound procedures listed in the body of the type definition, with the procedures themselves outside the type definition. Why list the procedures in the type? - readability and maintenance: so the programmer can look at the type definition and see what type-bound procedures are included. - simplicity: avoids the need for rules to determine what procedures in the defining scoping-unit are type-bound procedures. Why not put the procedures themselves into the type? - it would unnecessarily bloat the type definition - it would complicate the situation when several related types were being declared, and a type-bound procedure of one type wished to use the other types. Why the renaming ("PROCEDURE tbpname => procname")? - allows a single module to provide both a type and one or more extensions with specific type-bound procedure implementations for each. - will be a place for attaching a set of procedures with different type parameters (i.e. handles the interaction between dynamic dispatch and parameterised derived types). 4. Reference Syntax 97-230r1 proposes that the procedure be referenced as "variable%procname". Some of the reasons supporting this choice are: (a) This is consistent with the two other F2002 uses of the % token, viz - component access - type parameter access In each situation the % is being used to access entities in the scope of the type definition. (b) It looks like the invocation of a procedure pointer component (whether "value-bound" or "object-bound" according to my terminology); this is a good thing because there is an implied indirection (viz dynamic dispatch) in these situations, so the cost of the procedure reference (in terms of additional overhead and lost opportunities for optimisation) is similar. (c) This syntax is consistent with the "traditional" object-oriented styles of Smalltalk, where the model is one of "sending an object a message". It is also consistent with the Simula style, where the model is that one invokes a procedure that is inside an object. (d) It automatically manages the namespaces, preventing the automatic pollution which would occur if the procedure were invoked simply by "procname" and eliminating the opportunity for the user to lose the procedures by PUBLIC/PRIVATE/ONLY/rename mistakes. 5. Access to the Object: Functionality As mentioned above, in traditional object-oriented programming the called procedure has access to the object through which it was invoked. This is highly desirable, as it avoids unnecessary duplication of argument names; here is a simple example: ! ! Suppose we have a user-written "i/o channel" abstraction with ! clever buffering etc. Then we might have an array of i/o channels ! (to handle multiple streams), e.g. ! OBJECT(io_channel_type) io_channel(20) ! ! What we want to do is things like: ! CALL io_channel(13)%read_real_array(my_array) ! ! And we don't want to have to write ! CALL io_channel(13)%read_real_array(io_channel(13),my_array) ! ! instead. 6. Access to the Object: Form Examples in this section are based on the following outline: TYPE,EXTENSIBLE :: my_type ... CONTAINS PROCEDURE add_i => my_type_add_i END TYPE ... CHARACTER*20 FUNCTION my_type_add_i(m,i) TYPE(my_type) m INTEGER i ... END FUNCTION ... OBJECT(my_type) x 97-230r1 proposes that this access be provided by "magically" passing the object through which the procedure is invoked to a type-bound procedure as its first argument; thus the syntactically first actual argument becomes associated with the second dummy argument. e.g. PRINT *,x%add_i(27) This has raised some objections since if one invoked the same procedure by its name directly (i.e. not through an object via dynamic dispatch) no such magic occurs - the object must be supplied as the first argument directly: e.g. PRINT *,my_type_add_i(x,27) However, not having all the arguments following the "operator" symbol is not quite as foreign to Fortran as it may first appear, since we could also have done: e.g. INTERFACE OPERATOR(+) MODULE PROCEDURE my_type_add_i END INTERFACE ... PRINT *,x + 27 7. Access to the Object: Alternative Forms It is worth noting that different object-oriented languages have provided quite different syntactic facilities for access to the object through which the procedure was invoked. Here are some of the common alternatives converted for F2002: (a) via a special name, e.g. ``THIS''. -- this is typical of many OO languages e.g. INTEGER FUNCTION my_type_add_i(i) ... refer to THIS for the object (b) name given by special syntax -- this is used by some other OO languages e.g. INTEGER FUNCTION (x) my_type_add_i(i) ... refer to X for the object (c) name given by special clause -- this is an alternative syntax to (b) e.g. INTEGER FUNCTION my_type_add_i(i), OBJECT_REF(x) ... refer to X for the object (d) Provide a more generalised facility - one that passes the object through which a procedure is invoked whether the procedure is a type-bound procedure or a procedure component (which thus becomes object-bound). e.g. TYPE,EXTENSIBLE :: my_type ... CONTAINS PROCEDURE,MAGIC :: add_i => my_type_add_i ... END TYPE where we can put "MAGIC" onto procedure components as well as type-bound procedures. [OK, I am not suggesting that the keyword should be "MAGIC"!] (e) omit the functionality altogether These all avoid the "argument magic", but have other disadvantages: (a) Since we do not want reserved words in Fortran, this only really works if the procedure text is physically embedded in the type definition. This bloats the type definition (see item 3. above). The procedure is only callable through an object of the type. (b) The syntax is ugly. The procedure is only callable through an object of the type. (c) The procedure is only callable through an object of the type. (d) Additional typing needed for the most common case (viz a type-bound procedure wants access to its invoking object); provision of true "object-bound" procedures goes beyond minimalist OO. (e) Omitting support for the most common paradigm of OO programming is peculiar. Naturally my preference is for the proposal as stated rather than for one of these alternatives; of the alternatives I find (d) to be most acceptable, with (c) or (b) being reasonable and finally (e) and (a) being unreasonable. 8. Procedure Signatures The proposal states that the signature of a type-bound procedure cannot change (other than the type of the object through which it is invoked). This has the highly desirable characteristic of allowing compile-time type-checking for all non-polymorphic arguments. If this is not the case, type-bound procedure invocation is not type-safe. Note that no functionality is lost by this requirement; the user can program arbitrary polymorphism (covariant, contravariant, whatever) himself by the use of polymorphic dummy arguments, including his own explicit type tests which enable him to handle what would be "type errors" (under more rigid rules) in a safe fashion. 9. Final Procedures For optimisation purposes it is desirable to be able to indicate that a type-bound procedure cannot be (further) overridden in a further extension of a type. We call these "final" procedures (better suggestions for a term here welcome - we probably ought to think of something else to avoid any possible confusion between "final" and "finalization"!). The illustrative syntax in the proposal does not work (it suggests that for a non-final procedure the dispatching [i.e. first] argument be TYPE, and for a final procedure it should be OBJECT; in fact our argument association rules force this argument to be OBJECT in all cases (!) and this is not a reasonable or intuitive method to distinguish between them anyway). Instead, I propose that final procedures be declared with the attribute FINAL (or whatever term we decide for this) in the type body, e.g. TYPE,EXTENDS(MY_FIRST_TYPE) :: MY_SECOND_TYPE ... CONTAINS FINAL PROCEDURE signal => second_signal END TYPE ------------------------------------------------------------------------------- ...........................Malcolm Cohen, NAG Ltd., Oxford, U.K. (malcolm@nag.co.uk)