From: Kurt W. Hirchert J3/97-190 (Page of 6) Subject: Procedure Identities: Variables, Pointers, or Something Else? Meeting 141 J3/97-190 (Page of 6) Fortran 2000 requirement R3 has at various times been referred to as a requirement for procedure pointers or procedure variables. A description that does not prejudge this distinction might be that it is a requirement for a "changeable procedure identity" (cpi) - a named entity to which one can assign the identity of a procedure such that one can later invoke that procedure through that named entity. The main feature that distinguishes a cpi from a dummy procedure (which also fits the previous sentence) is that the duration of "assignment" to the cpi is entirely under program control and not forced to correspond to the lifetime of a procedure execution. Also, a cpi can be used to transitively define another cpi and can be compared with another cpi or a fixed procedure identity. I believe it is fair to say that, in the judgment of the /data subgroup, the same semantics can ultimately be achieved whether one calls a cpi a procedure variable, a procedure pointer, or something else (e.g., a procedure accessor). The issues all have to do with the notation that one uses to express these semantics and the extent to which one's expectations from data variables and pointers helps or hinders one in remembering this notation and the associated rules on using it. In this document, I have attempted to summarize the specific issues brought out in subgroup discussions. I have attempted to present these issues in an evenhanded manner, but since I do have an opinion on this matter, it is entirely possible that I have not fully succeeded. What's "Natural"? There are conflicting arguments about which approach is most intuitive. Proponents of the procedure pointer approach have pointed out that the likely representation of a cpi is the machine address of the code implementing the procedure and that we typically call such addresses pointers. In such a view, a "procedure variable" would be one in which one actually stores the code to implement a procedure. Proponents of the procedure variable approach counter that allocatable arrays and ISO varying strings are examples where the direct representation contains a pointer to the "real" representation and that the collective procedures of a program might be thought of a defining a giant enumeration type. It is strange to have pointers to X if we do not have X variables and that variables have to be declared targets, but procedures are targets automatically. Proponents of "something else" can argue that both analogies are flawed and that it would be cleaner to start from scratch. Declaration Syntax A fundamental problem in the declaration area is that with ordinary data REAL :: a_real_variable REAL, PARAMETER :: a_real_constant=1.0 the simpler declaration syntax describes something that is modifiable and one adds an extra attribute to get something fixed, but that with procedures EXTERNAL :: a_subroutine REAL, EXTERNAL :: a_real_function the simplest available syntax has already been made to mean fixed identities in existing standards, so it became necessary to add an attribute (this example for the pointer approach) REAL, EXTERNAL, POINTER :: a_changeabe_real_function_identity This was found especially objectionable in the case of derived components, where modifiable things are the only things allowed. There was the further complication, that in some cases one needed to add this extra attribute to statements that don't support multiple attributes in Fortran 90/95. In an attempt to deal with both of these problems, the proponents of procedure variables suggested that their extra attribute (VARIABLE) be assumed in derived types. Although slightly more defensible than making a comparable suggestion for the POINTER attribute, this had the effect of making component declaration syntax mean something different from what that same syntax would mean outside a derived type definition, so that was unpopular, as well. To solve the problem of multi-statement declarations (interface blocks) in derived type definitions, a method had been invented to give a name to a set of procedure characteristics, so that a single statement could be used to declare a procedure identity whose interface includes those characteristics. Current subgroup thinking is to treat the latter statement analogously to a type statement with the simplest form declaring changeable procedure identities and the addition of another attribute to denote those that are fixed: PROCEDURE(real_function) :: a_changeable_real_function_identity PROCEDURE(real_function), EXTERNAL :: a_fixed_external_real_function Note that this declaration approach can be used whether our changeable identities can be called pointers or variables. "No Procedure" Value Past experience with existing languages that support some kind of cpi is that many applications require some means of indicating "no procedure" rather than a specific procedure (somewhat analogous to an optional dummy procedure). It would be possible, to simply require programmers to create their own "no procedure" identities by creating appropriately named procedures, but most people seem to prefer the idea of a language supplied identity for this purpose. For the procedure pointer approach, the value NULL() is available at no cost. For the procedure variable or "something else" approaches, there is a small cost - it is necessary either to create a separate method of creating a "no procedure" value or to explicitly extend NULL to generate such non-pointer special values. (The latter approach appears to be the one currently favored by procedure variable proponents.) Integration with Dummy Procedures With the procedure variable approach, it seems almost inevitable that one say that the existing feature called a "dummy procedure" is nothing more than a "dummy variable" where the variable is a "procedure variable" (i.e., a "dummy procedure variable"). Given the similarity between a dummy procedure and a cpi, this has the desirable effect of making them the same, so the rules for assignment/association need be written only once. However, the need to be completely compatible with the existing feature also creates complications: Dummy procedures are, in effect, INTENT(IN), and many implementations take advantage of this and use a method of passing procedures that does not allow for changing the input. Thus, to allow Fortran 2000 implementations to be object-compatible with existing Fortran 90/95 implementations, we would need rules like € Procedure variables are INTENT(IN) by default (as opposed to the unspecified intent for all other variables). € If a dummy procedure variable is given an intent other than INTENT(IN), the interface must be explicit where it is called (as opposed to intent of all other variables having no effect on explicitness). With the procedure pointer approach, one could do this integration (albeit, with more text to justify associating a fixed procedure identity with a cpi), but one also reasonably has the option of not doing this integration (to avoid the extra explicitness and intent rules), at the cost of having to explain the interaction between dummy procedures and procedure pointers. The something else approach is in about the same situation as the pointer approach. Assignment The procedure variable approach suggests cpi = procid The procedure pointer approach suggests cpi => procid The something else approach suggests cpi := procid or call proc_assign(cpi,procid) In the procedure variable approach, will users have problems with the difference between the following two statements? name1 = procid ! This would be "name1 => procid" in the pointer approach name2 = procid() ! This is an ordinary assignment in both approaches This is, of course, the same distinction that has to be made in the following: call sub1(procid) call sub2(procid()) In the absence of new rules, the procedure variable approach could allow cpi = 3.141592653589793238462643 invoking a defined-assignment procedure. Is this what we want? There is some suggestion that the procedure pointer approach may also allow this, but not cpi => 3.141592653589793238462643 Comparison Procedure variables: IF (cpi /= NULL()) CALL cpi IF (cpi2 /= taboo_function) x = cpi2(y) Procedure pointers: IF (ASSOCIATED(cpi)) CALL cpi IF (.NOT.ASSOCIATED(cpi2,taboo_function)) x=cpi2(y) Something else IF (PASSOCIATED(cpi)) CALL cpi IF (.NOT.PASSOCIATED(cpi2,taboo_function)) x=cpi2(y) Mostly, this is a question of what you find "prettiest", but there may be some concern in the procedure variable approach whether there would be confusion among IF (cpi==cpi2) CALL do_something ! are these the same procedure? IF (cpi()==cpi2()) CALL do_something ! do they return the same result? IF (cpi==cpi2()) CALL do_something ! ?! cpi2 is a procedure- valued ! Is cpi the same procedure as the procedure returned by cpi2 This is certainly unambiguous to the compiler, so this is "merely" a question of whether programmers might be confused. In the procedure pointer approach, these three statement would look like the following: IF (ASSOCIATED(cpi, cpi2)) CALL do_something IF (cpi()==cpi2()) CALL do_something IF (ASSOCIATED(cpi, cpi2())) CALL do_something Arrays of Procedures With the procedure variable approach, an obvious question is that if we have scalar procedure variables, can we also have array procedure values? If so, we can ask whether one can use elements of such arrays directly x = proc_array(subscript)(arguments) or only indirectly scalar_proc = proc_array(subscript) x = scalar_proc(arguments) If direct notation is allowed, how does it interact with our notion of array expressions? Do we allow things like the following? array(:) = proc_array(:)(arguments) If so, would array(1:n) = proc_array(1:n)(array2(1:n)) be more like do (i=1,n); array(i) = proc_array(i)(array2(i)) ; end do or do (i=1,n); array(i) = proc_array(i)(array2(1:n))) ; end do or could it be like either depending on the explicit interface of proc_array? In a slightly different direction, could elemental procedures now accept procedure array arguments, as in the following? result(1:n)=integrate(function(1:n),start_point(1:n),end_point(1:n)) In yet another direction, it becomes visually ambiguous whether IF (cpi1(i)==cpi2(i)) CALL do_something is comparing the results of two procedures or the identities of two procedure array elements? The extent of such questions suggests that at least some of these options should be prohibited. For all three approaches, one can use the circumvention of putting a cpi in a derived type and then declaring an array of that derived type: TYPE wrapper PROCEDURE(real_function) :: proc END TYPE wrapper TYPE(wrapper), DIMENSION(10) :: proc_array This allows syntactic expression equivalent to many of the previous questions. E.g. x = proc_array(subscript)%proc(arguments) In the procedure pointer approach, many of the hard questions are moot because referencing a pointer component of an array parent is already prohibited. In the procedure variable or something else approaches, a similar restriction could be added (as a slight bump in the language). Pointer to cpi In the procedure variable approach, the obvious expectation is that one can make such a variable a target and then have a pointer to such variables (or allocate them). This makes it necessary to distinguish the following: ptr_to_proc_var = NULL() ! set the proc variable to "no procedure" ptr_to_proc_var => NULL() ! indicate there is no procedure variable With the other approaches, one once again uses the wrapper type, so these would be written ptr_to_cpi%proc => NULL() ! set the cpi to "no procedure" ptr_to_cpi => NULL() ! set the ptr to indicate no cpi which seems less syntactically similar. NULLIFY Is the NULLIFY statement allowed as a synonym for setting a cpi to NULL()? (If this is allowed in the procedure variable approach, there is a true ambiguity when one tries to apply NULLIFY to a pointer to a procedure variable.) Procedure-valued Functions I have heard some concern expressed about whether the procedure variable approach can engender extra confusion in the case of a function which returns procedure variable and which uses the function name as its result variable. I have been unsuccessful in constructing an example that seems any more confusing than examples that have nothing to do with procedure variables/pointers/whatever. Conclusion (Mine, Not the Subgroup's) None of the costs of the differences described above are individually very large. The issue for me is the collective weight of these costs. ‡