From:  Kurt W. Hirchert                        J3/97-265 (Page  of 2)
Subject:  Example of Polymorphism Issue                    Meeting 143


                       J3/97-265 (Page  of 2)
This  is  intended  to be a slightly more realistic illustration  of  the
polymorphism  issue described in J3/97-261, with some estimation  of  the
costs of the alternatives and tradeoffs.  Let us begin with an extensible
type  representing 2D vectors with two type-bound operations -  obtaining
the  length  of  a single vector and computing the cosine  of  the  angle
between two such vectors:
     TYPE,EXTENSIBLE::vector_2d
       REAL::x,y
     CONTAINS
       length=>length_2d
       corr=>corr_2d
     END TYPE
     
     FUNCTION length_2d(v) RESULT(length)
       TYPE(vector_2d)::v; REAL:: length
       length=sqrt(v%x**2+v%y**2)
     END FUNCTION length_2d
     FUNCTION corr_2d(v1,v2) RESULT(corr)
       TYPE(vector_2d)::v1,v2; REAL::corr
       corr=(v1%x*v2%x+v1%y+v2%y)/(v1%length()*v2%length())
     END FUNCTION corr_2d
Using  this  type,  we can write a subroutine that  prints  out  all  the
cosines for all combinations of vectors drawn from a list of vectors:
     SUBROUTINE print_corr_table(vlist)
       OBJECT(vector_2d)::vlist(:); INTEGER::j,k
       DO(j=1,size(vlist))
         print *,(vlist(j)%corr(vlist(k)),k=1,j)
       END DO
     END SUBROUTINE print_corr_table
There  are two costs to declaring vector_2d with OBJECT rather than  type
(thus making this procedure polymorphic:

1.   The  reference  to  vlist(j)%corr must be made  through  a  run-time
     dispatch  table, at the cost of an additional memory  reference  per
     function  reference.  In this particular example, a  smart  compiler
     might  keep that the function address will be invariant in the  loop
     on j and keep that function address in a register.

2.   For  alternative B (the one similar to Ada 95), a naive analysis  of
     the  types  in  the  reference to vlist(j)%corr  would  suggest  the
     possibility of a run-time type mismatch.  If you wish to  run  in  a
     mode  where such errors are caught, the necessary code (assuming  an
     appropriate  representation of the type information)  would  involve
     verifying  that one integer in memory is greater than  or  equal  to
     another integer in memory and then than one address stored in memory
     is  equal  to another address stored in memory.  In this  particular
     case,  the  two integers would come from the same place  memory,  as
     would  the  two addresses, so a smart compiler might recognize  that
     the correctness condition is always true and optimize the test away.

     Under  alternative A (the one similar to C++), a naive  analysis  of
     the  types in that reference will show them to be safe, so no safety
     test would be necessary.

Let  us now extend this type (and these operations) to a 3D vector  type.
Under alternative B (like Ada 95), this is easy:
TYPE,EXTENDS(vector_2d)::vector_3d
  REAL::z
REPLACES
       length=>length_3d
       corr=>corr_3d
     END TYPE
     
     FUNCTION length_3d(v) RESULT(length)
       TYPE(vector_3d)::v; REAL:: length
       length=sqrt(v%x**2+v%y**2+v%z**2)
     END FUNCTION length_3d
     FUNCTION corr_3d(v1,v2) RESULT(corr)
       TYPE(vector_3d)::v1,v2; REAL::corr
       corr=(v1%x*v2%x+v1%y+v2%y+v1%z*v2%z)/(v1%length()*v2%length())
     END FUNCTION corr_3d
The   polymorphic   procedure  print_corr_table  can  operate   just   as
efficiently  on  an array of type vector_3d as it does  on  one  of  type
vector_2d.

Under  alternative  A  (like C++), things do not  go  so  smoothly.   The
required type for v2 in our vector_3d version of corr would be vector_2d.
We  can  force  a type match with this type, but we would then  not  have
access  to the z component of v2, and we need that access to compute  the
right formula.  To get around this limitation, we must go back to corr_2d
and   change   the   declaration   of   v2   from   TYPE(vector_2d)    to
OBJECT(vector_2d), permitting us to associate a vector_3d actual argument
without losing access to its additional component.  If we made no further
changes to corr_2d, we would have changed the reference v2%length()  from
something  that  can  be resolved at compile-time to something  requiring
dynamic dispatch, but we can fix that problem by rewriting that reference
as  v2%vector_2d%length().   The text for  corr_3d  cannot  corrected  so
easily.   It  is still not syntactically correct to reference  v2%z.   We
must first coerce v2 from OBJECT(vector_2d) to vector_3d.  One way to  do
that would be as follows:
     FUNCTION corr_3d(v1,v2) RESULT(corr)
       TYPE(vector_3d)::v1; OBJECT(vector_2d)::v2; REAL::corr
       corr=corr_3d_helper(v1,v2)
     END FUNCTION corr_3d
     FUNCTION corr_3d_helper(v1,v2) RESULT(corr)
       TYPE(vector_3d)::v1,v2; REAL::corr
       corr=(v1%x*v2%x+v1%y+v2%y+v1%z*v2%z)/(v1%length()*v2%length())
     END FUNCTION corr_3d_helper
The  costs  here are an extra level of procedure reference and  a  safety
check   that  the  OBJECT(vector_2d)  v2  in  corr_3d  can  be  correctly
associated with the TYPE(vector_3d) v2 in corr_3d_helper.  Presumably,  a
smart  compiler could eliminate the cost of the extra level of  procedure
reference by inlining corr_3d_helper in corr_3d.

Let us summarize the differences between the alternative A (like C++) and
alternative B (like Ada 95) versions of this example:

1.   They involved a similar number of safety checks, albeit at different
     levels.  In this particular example, the alternative B safety checks
     could  be  optimized  away, but I am certain  there  would  be  many
     variants where this would not be the case.

2.   The  direct  implementation  costs for alternative  A  are  slightly
     higher  (because  of  the  need to coerce the  OBJECT(vector_2d)  to
     TYPE(vector_3d)),  but  it should be possible  optimize  away  those
     extra costs.

3.   The  alternative  A  version is a bit more  verbose  (more  work  to
     write).

The  reason alternative B is less expensive for this example is that  the
natural extension of our operation between two 2D vectors is an operation
between  two  3D  vectors.  If we had chosen an operation  whose  natural
extension was an operation between a 3D vector and 2D vector, alternative
A  would  have  been cheaper and less verbose.  (I haven't been  able  to
think of such an example, but maybe you can.)

Thus,  it  would  appear  that the following questions  are  relevant  in
deciding among the alternatives:

o    How important is it that Fortran the example of well-known languages
     like C++ in this regard?
     
     *    What fraction of Fortran programmers will know these languages?
     
     *    Of them, how many will know these particular details?

     [I suspect that relatively few people that have learned C++  or Java
     well  will  choose  to  switch to Fortran for doing  object-oriented
     programming,  so  I  feel following their lead is  not  particularly
     important in avoiding surprise.]

o    What proportion of real programs are like our example, better served
     by alternative B?

     [I  have  been unable to think of a real example that wasn't  better
     served by alternative B.]

o    How  significant  are  the differences in implementation  speed  and
     program verbosity?

     [The implementation speed differences are small enough to be ignored
     in  nearly  all  cases.   It is the differences  in  verbosity  that
     disturb me.]