J3/03-266r1 Date: 14 November 2003 To: J3 From: Aleksandar Donev Subject: Post F2003: ELEMENTAL Procedures and Efficiency, and PURE Reference: J3-007R3, J3/02-315 Response is in J3/03-277 ______________________________________ Summary ______________________________________ I point to some problems with efficiently using ELEMENTAL procedures in Fortran 2002 and suggest a compiling scheme that would fix this, along with some needed backward-compatible modifications to the current draft that would make this implementation more feasible. This was partly inspired by recent posts/discussions with Van Snyder, as well as past personal frustrations with elementals in Fortran 95. Problems also exist with PUREs, discussed later. NOTE: There is an extended example at the end of this paper. ___________________ ELEMENTAL ___________________ Required changes to the way ELEMENTALs work include (in order of priority): 1. Remove the constraint prohibiting elementals as actual arguments and procedure pointer targets, and, 2. Making sure that non-intrinsic elementals and non-elementals are not mixable, i.e., an elemental procedure pointer, type-bound procedure or procedure dummy argument should only be allowed to point, be bound to, or be argument associated with a non-elemental. 3. Add the dummy argument attribute SCALAR (and possibly NONSCALAR) that guarantee that the actual will be scalar. Either 4a. Allow INTENT(OUT) SCALAR arguments, or even better, 4b. Remove the purity requirement on elementals and replace with the condition that order of execution be unimportant to the user. The problem described is part of a much more general issue of writing TKR-generic procedures in Fortran (think templates in C++ or generic classes in Eiffel), but no attempt is made here to solve the general problem (it is coming shortly, after consultation with Van and others). Besides, it is too late to finally introduce genericity properly in Fortran. But it is not too late to fix elementals! The proposed implementation is only a suggestion to vendors. I am aware that it will not be high on their priority list. But at least we are making the way for such efficient implementations of elementals later. ___________________ PURE ___________________ PURE procedures are another vastly underused F95 feature. They are a very nice concept, but the basic problem is that the restrictions on PURE are too strong. For example, one cannot even raise an error flag (think exceptions), or even point a pointer to some global data (I am not saying modify the data, but just point to the data). This is especially true because ELEMENTALs are required to be PURE. The real requirement is that the order of execution of ELEMENTALs not matter. I make suggestions here on how to relax this for ELEMENTALs, but do not try to fix PURE ones. ______________________________________ The Problem ______________________________________ There is a frequently appearing task in Fortran programming which still finds no good solution: Making procedures that can be called with arrays of arbitrary rank...efficiently. Assume for the moment that these routines can be written, using array-syntax, in a way that is otherwise independent of the rank. Elementals provide this, albeit only for pure operations that completely disregard the possible "arrayness" of the arguments. More importantly, there is an efficiency problem with elementals which has unfortunately made them a very seldomly used array feature in Fortran 95 (with the exception of, of course, the intrinsics). The problem with elementals is that to really have efficiency, in a lot of cases, the loop over the array elements needs to go *inside* the elemental procedure, rather then outside a procedure call to a scalar version of the elemental. I will assume that inlining is not possible, and in the examples I give this is the case, either because dynamic binding is used with OO approaches, or because the user actually provides the elemental procedure. Vendors typically implement the loop around the call, possibly since this is easier. However, with a few modifications in the way ELEMENTAL procedures work, a few deletions of restrictions on elementals which make no sense, and a change in attitude from compiler writers, the usefulness of ELEMENTALs can significantly be increased. This issue is compounded with OO approaches, especially because of dynamic dispatch, whose overhead can dramatically be reduced if the slot-table lookup is done externally to the loop. Examples of the need for rank-indepedendent procedures include random number generation (see comments on non-purity below), as well as a whole collection of problems in which the user provides a routine which operates on scalar arguments but in reallity needs to be called on a grid of points {I attach at the end a (long) example of numerical quadrature of a one-dimensional function Int[f(x),x=a..b]}. A noticeable feature in both these examples, and in general a stumbling block to putting the loop in the elemental, is the fact that some of the arguments, typically the passed object in OOP solutions, are scalar. A commonly occuring feature is also that the user can predict this ahead of time and would be better off restricting this to be a scalar indeed. I suggest that a new attribute for dummy arguments of ELEMENTAL procedures be added, say SCALAR, which would state that this argument is not part of the "elementalness" of the procedure. I would really like to design a novel set of language features that deals with this issue of genericity, i.e. writing code which applies to different TKR actual arguments without any change. Kind of like templates in C++, but with the added complexity of the rank. However, this is a major job. Therefore, I resort to elementals, and suggest some modifications which would make them a viable replacement. ______________________________________ The Solution ______________________________________ I give an example using random number generation below. Random number generation is not pure, and the ordering of the generation does matter for the final result. However, for the user the ordering does not matter (otherwise he/she should not call the generator with an array argument but put the look around the scalar version him/herself), just like with the RANDOM_NUMBER intrinsic. I have assumed below that the restriction on SCALAR arguments being intent(in) has been removed, although this issue is of course separate from this proposal. For now, just take this as a very simple, yet illustrative, example. Here is how the declaration of an OO random number generator would look like in my proposed scheme: module Random_Number_Generation type, extensible :: random_number_generator integer :: seed=0 contains procedure(next_number), pass(generator), deferred :: next_number ! Generates the next number in the pseudorandom stream end type random_number_generator abstract interface elemental subroutine next_number(generator,number) ! Can be used to generate arrays of random numbers class(random_number_generator), SCALAR, intent(inout) :: generator real, intent(out) :: number end function next_number end abstract interface end module Random_Number_Generation For users that really care about efficiency, the above generator, without the SCALAR, would not be a satisfactory solution in F2x, due to the above efficiency concerns. The solution one would have to resort to, is a very ugly one, in which there is a generic binding for next_number, which has to overriden in every specific extension (implementation) of a random-number generator. So now every time we extend random_number_generator we have to write 8 (or maybe 9, with one special case for contiguous arrays) different specific procedures, which will inline the random-number generator manually. This produces long, obscure, macro-monsters of codes which sadly I have to write almost every day. Notice that we cannot write these rank-specific routines only once, and then reuse them in subsequent extensions, since we do not yet have access to the scalar random-number generator at this API stage. If instead, the compiler actually generated these 8/9 versions of the elemental procedure for arguments of different rank automatically (which is very simple), simplicity and efficiency would be combined in a very appealing way. So an elemental procedure will in fact become something like a dope-vector with 8/9 procedure pointers, one for scalars, one for ranks 1 through 7, and one for the case when all arrays involved are contiguous. Elemental dummy procedure arguments, procedure pointers, and type-bound elementals will in fact be implemented as such dope-vectors. Of course this is just a suggestion and there other alternatives. It should be up to the vendors to choose a strategy, and consistently use it in all cases elementals are involved. Some will raise the issue of what to do when some of the actual arguments for one of the non-SCALAR arguments are also scalar (as they can be according to the current rules for ELEMENTAL). In this case, the old strategy of simply putting the loop around the scalar version will be satisfactory, and is still available to the compiler. Or the compiler can simply replicate the scalar to an array, depending on what it chooses to do. We could also add another dummy argument attribute, say NONSCALAR, which will guarantee that the actual is an array. ______________________________________ Required Changes and Edits ______________________________________ I do ask for some help to ensure that these are complete. I state the required functionality first, and then the essential edits: 1. Remove the constraint prohibiting elementals as actual arguments and procedure pointer targets, but make sure that 2. Non-intrinsic elementals and non-elementals should not be mixable, i.e., an elemental procedure pointer, type-bound procedure or procedure dummy argument should only be allowed to point, be bound to, or be argument associated with a non-elemental. Edits: ___________________ Procedure pointers: 143:28 Delete C728 and possibly replace with: C728: If is a nonintrinsic elemental procedure, then shall have an explicit interface and be elemental. I do not think this is needed because of 144:16-20. 260: 31-32 Why is this there, and how do we need to change this? ___________________ Type-bound procedures and procedure components: 55:10+ Add (2+1/2) If the inherited binding is elemental then the overriding binding shall also be elemental. ___________________ Procedure dummy arguments: 263: 25 Remove C1229 and replace with 263: 30+ If procedure-name actual-arg is the name of a nonintrinsic elemental procedure, the dummy argument corresponding to actual-arg shall have an explicit interface and be elemental. Is this enough and is there a better place to add this? 267: 25 Delete the parenthesized ending. ___________________ 3. Add the dummy argument attribute SCALAR (and possibly NONSCALAR) that guarantee that the actual will be scalar. This will also be considered part of the procedure characteristics. I do not give detailed edits here since this is subject to interpretation. Either 4a. Allow INTENT(OUT) SCALAR arguments, or even better, 4b. Remove the purity requirement on elementals and replace with the condition that order of execution be unimportant to the user. The random-number generation example above illustrates the fact that the requirements that the results of the execution of an elemental reference be completely independent of the order of execution is too strong. Consider a case where a user just wants an error flag raised if an error occurs during the execution of an elemental reference. The user does *not* want an array of flags (think exceptions). Solution 4a above provides an alternative. ______________________________________ Extended Example ______________________________________ Here is another example which attempts to illustrate the occurence of non-inlinable ELEMENTALs in codes which require the user to provide some procedure, which will be called on an array of non-zero rank, but the user does not need to know this. It is definitely not a desirable thing to ask the user to write 8 specific routines himself, as this is a hard and error-prone task. The example is one of numerical quadrature (integration) of a user-provided function f(x) on the interval [a,b], i.e., the evaluation of a definite integral (DI). An OOP solution is given. Here f(x) needs to be evaluated on a whole grid of points (the integration mesh), which happens to be two-dimensional because it represents some kind of hierarchical fancy mesh. The code is not documented in detail. Feel free to ask me about it. First we design an abstract specification for an OOP definite integrator: module DI_Quadrature ! DI=Definite Integral type, extensible :: di_problem ! The definite integral that is needed ! This will contain user-data upon extension real :: x_lb=0.0, x_ub=1.0 ! Bounds of integral ... contains procedure(f), pass(di), deferred :: f ! User provided f(x) ... end type di_problem type, extensible :: di_integrator ! The actual numerical integrator real :: global_error=0.0 ! Estimated error ... contains procedure(Integrate), pass(integrator), deferred :: Integrate ! The actual integrator--Euler, Runge-Kutta, etc. ... end type di_problem abstract interface elemental function f(problem,x) class(di_problem), intent(inout) :: problem real, intent(in) :: x real :: f end function f subroutine Integrate(problem,integrator) class(di_problem), intent(in) :: problem class(di_integrator), intent(inout) :: integrator end subroutine Integrate end interface end module DI_Quadrature Now we actually implement a specific fancy integration scheme: module DI_Fancy_Quadrature ! Specific implementation: Fancy Fancy quadrature for DI's use DI_Quadrature type, extends(di_integrator) :: fancy_di_integrator integer :: n_points=0 ! Number of needed points along x axis real, dimension(:,:), allocatable :: x ! These are rank-2 for *this* integrator only ... contains procedure :: Integrate=>IntegrateFancy5 ... end type fancy_di_integrator contains subroutine IntegrateFancy5(problem,integrator) class(di_problem), intent(in) :: problem class(fancy_di_integrator), intent(inout) :: integrator real, dimension(5,integrator%n_points), allocatable :: f_x ! f(x) on a fancy grid ... f_x=problem%f(integrator%x) ! Call the elemental routine ! This will call the rank-2 specific ``elemental'' ... end subroutine end module DI_Fancy_Quadrature The user on the other hand writes a description of his integration problem (in this case f(x)=a*x+x^b) for some user-specified a and b: module My_DI use DI_Quadrature ! We do *not* use DI_Fancy_Quadrature since this may not yet be written type, extends(di_problem) :: my_di_problem real :: a=1.0, b=1.0 ! *Toy* example ! In real life there would be something *complicated* here ! such as a Monte-Carlo simulator to obtain f(x) contains procedure :: f=>my_f end type my_di_problem contains elemental function my_f(di_problem,x) class(my_di_problem), intent(inout) :: my_di real, intent(in) :: x real :: f f=my_di%a*x+x**(my_di%b) ! f(x)=ax+x^b end function my_f end module My_DI And here is how this can actually be used in a program: program DI_Example use DI_Quadrature use DI_Fancy_Quadrature type(my_di_problem) :: problem type(fancy_di_integrator) :: integrator problem%a=2.0; problem%b=-2.0 // Make a specific DI ... call integrator%integrate(problem) ! You can also try another integrator if you have one ... end program DI_Example