J3/10-164
To: J3
From: Malcolm Cohen
Subject: Final discussion/motivation
Date: 2010 June 01


1. Introduction

I have several interpretation requests on the finalization facility.
However, these do not completely cover my doubts as to whether it is
correctly described in the standard.

One of the "poster children" for this facility was the support of dynamic
types like those provided by ISO_VARYING_STRING.  This paper attempts to
construct such a type with finalization and discusses its shortcomings.

I note that although ISO_VARYING_STRING itself is better provided for by
the use of ALLOCATABLE components, for more complicated dynamic types that
is (allegedly) not necessarily the case, meaning we need finalization.
However, more complicated cases are even more likely to be problematic than
the simple case (and certainly harder to discuss), so I have chosen this
type for the initial discussion.


2. Goals of the type

The ISO_VARYING_STRING module as originally written used pointer
components, but leaked memory.  The goal of a type with finalization is to
leak memory less, and if possible not at all; it would certainly be
acceptable to leak memory in a small number of pathological cases.

Obviously unacceptable behaviour includes things like deallocating memory
more than once, or deallocating memory still in use.

The module will omit nearly all of the actual operations of
ISO_VARYING_STRING, just having enough to see how well it works.


3. The example module

MODULE m8_varying_string
  PRIVATE
  TYPE,PUBLIC :: m8string
    PRIVATE
    CHARACTER(:),POINTER :: value => NULL()
  CONTAINS
    PROCEDURE,PRIVATE :: m8s_asgn_m8s
    GENERIC :: ASSIGNMENT(=) => m8s_asgn_m8s
    PROCEDURE,PRIVATE :: m8s_concat_m8s
    GENERIC :: OPERATOR(//) => m8s_concat_m8s
    FINAL :: m8szap
  END TYPE
  PUBLIC new_m8string
CONTAINS
  ELEMENTAL TYPE(m8string) FUNCTION new_m8string(ch) RESULT(r)
    CHARACTER(*),INTENT(IN) :: ch
    ALLOCATE(CHARACTER(LEN(ch))::r%value)
    r%value = ch
  END FUNCTION
  ELEMENTAL SUBROUTINE m8s_asgn_m8s(a,b)
    CLASS(m8string),INTENT(OUT) :: a
    CLASS(m8string),INTENT(IN) :: b
    ! Do not need to deallocate a%value because a is already finalized.
    ALLOCATE(CHARACTER(LEN(b%value))::a%value)
    a%value = b%value
  END SUBROUTINE
  ELEMENTAL TYPE(m8string) FUNCTION m8s_concat_m8s(a,b) RESULT(r)
    CLASS(m8string),INTENT(IN) :: a,b
    ALLOCATE(CHARACTER(LEN(a%value)+LEN(b%value))::r%value)
    r%value = a%value//b%value
  END FUNCTION
  ELEMENTAL SUBROUTINE m8szap(x)
    TYPE(m8string),INTENT(INOUT) :: x
    IF (ASSOCIATED(x%value)) DEALLOCATE(x%value)
  END SUBROUTINE
END MODULE

Looks simple, what could possibly go wrong?


4. Array constructors are a problem.

Example 0: Function references

Consider the code sample
  x = ... new_m8string('abc') ... new_m8string('abc') ...
i.e. some expression that has the same call to new_m8string more than
once.  It might be argued that the processor may evaluate this function
just once and use the value in two places.

Seeing as how a reference to the function never returns the same value as a
previous reference, and this could be true of many other functions, there
seems to be a flaw in our permission logic here.

Example 1: Array constructors and function references

  TYPE(m8string) a(3)
  a = [ (new_m8string('xyz'),i=1,3) ]

This will call new_m8string 3 times (ignoring the issue raised earlier),
place those three values in a (constructed) array.  This gives us,
conceptually,
  TYPE(m8string) funtmp1,funtmp2,funtmp3,arraytmp(3)
  CHARACTER(3),POINTER :: chval1,chval2,chval3
where each chvalN has been allocated and assigned the value 'xyz',
and the values of the other temporaries are as follows:
  funtmp1%value => chval1
  funtmp2%value => chval2
  funtmp3%value => chval3
  arraytmp(1)%value => chval1
  arraytmp(2)%value => chval2
  arraytmp(3)%value => chval3

According to 4.5.6, what happens next is finalization of the function
reference results and finalization of the array constructor (not in any
particular order).  WLOG, consider the sequence
  CALL m8szap(funtmp1)
  CALL m8szap(funtmp2)
  CALL m8szap(funtmp3)
  CALL m8szap(arraytmp) ! elemental
The first three calls to m8szap will deallocate chval1, chval2 and chval3,
and the fourth one will again try to deallocate chval1, chval2 and chval3;
this is obviously a problem!  (In reality, the pointer components of
arraytmp have become "undefined" - and some processors might detect this
and abort the program, other processors might double-deallocate the memory
and even more fun will be had later.)

Example 2: Array constructors and variables

  TYPE(m8string) b,c,d,e(6)
  b = 'abc'
  c = 'the quick brown fox'
  d = ''
  e = [ b,c,d,b,c,d ]

Here, for the sake of discussion we give the names bval, cval and dval to
the allocated objects pointed to by b%value, c%value and d%value.  We will
call value of the array constructor ac2, and we have
  ac2(1)%value => bval
  ac2(2)%value => cval
  ac2(3)%value => dval
  ac2(4)%value => bval
  ac2(5)%value => cval
  ac2(6)%value => dval

Following the execution of the assignment statement, the array constructor
is finalized.  This will deallocate bval, cval, and dval twice each,
incidentally destroying the variables b, c, and d.


5. Why are array constructors problematic?

Quite simply, because they construct their value "intrinsically", bypassing
any user-defined assignment.  One could imagine there might be a simple
principle here: if the user didn't need to run any code to make the value,
he doesn't *need* to run any code when the value "goes away" either.

There really doesn't seem to be any way around this other than by simply
not finalizing the array constructor.  Note that this gives the correct
behaviour: the variables would not be deconstructed, and the function
results would be zapped exactly once each.

Another obvious fix that would work (but is unavailable) would be to make
the array constructor assign values to its elements by "normal" assignment
i.e. any relevant user-defined assignment should be called.  If we did
that, then the array constructors should be finalized.  We could not do
that, because it would have been incompatible with Fortran 90 and 95.  In
hindsight, we could have done it for type-bound assignment anyway, but it
would have been an unusual difference between type-bound assignment and
interface-block assignment, so not terribly desirable anyway.


6. Does this happen in other situations?

Sadly, yes.

Example 3: ALLOCATE with SOURCE=.

  TYPE(m8string) :: a
  TYPE(m8string),POINTER :: p
  a = new_m8string('oh no, not agaiN!')
  ALLOCATE(p,SOURCE=a)
  ...
  DEALLOCATE(p)

The ALLOCATE results in ASSOCIATED(a%value,p%value) being true, so
the DEALLOCATE destroys variable a.

The same thing happened here: we imagined that we wanted simple value
semantics for SOURCE= (like we have for structure constructors and array
constructors), and did not realise the interaction with FINAL would be
problematic in this way.

Example 4: The TRANSFER intrinsic.

  TYPE(m8string) :: a,b
  a = new_m8string('ugh')
  b = TRANSFER(TRANSFER(a,(/1/)),b)

This use of TRANSFER is explicitly guaranteed to have the same value as a.
If, for the sake of argument, we call the result of the outer TRANSFER
  TYPE(m8string) transtmp
what we then have is that ASSOCIATED(transtmp%value,a%value) is true.

That's all fine for the assignment to b (which makes a copy via our
defined assignment), but the finalization of the outer TRANSFER will
deallocate transtmp%value thus destroying the value of a.

Example 5: The SPREAD intrinsic.

  TYPE(m8string) :: a,b(2)
  a = new_m8string("we're all doomed")
  b = SPREAD(a,1,2)

Obviously, this is going to produce an array temp with both elements having
their value components associated with the target of variable a;
finalization of the function result will not only destroy the variable a
but also deallocate the same block of memory twice.

Example 6: The RESHAPE intrinsic.

Constructing this example is left as an exercise to the reader.

Example 7: Other intrinsic procedures.

Obviously EOSHIFT, CSHIFT, PACK and UNPACK are all similarly afflicted.  In
fact, unless I am very much mistaken, any intrinsic function that returns a
value of finalizable type!


7. How can we fix this?

These are somewhat harder.  Fortunately, they are less high-profile than
array constructors.

One possibility for ALLOCATE+SOURCE= would be to bite the incompatibility
bullet and declare that it does use defined assignment.  It's not possible
to forbid it syntactically, as the dynamic type might have a final
subroutine.

For TRANSFER, it would be tempting just to say "don't do that", but the
fact that other intrinsic functions are affected makes that somewhat less
appealing.

For all the affected intrinsic functions, we could just say that their
results are not finalized.  That does appear to solve the problem.


8. Summary

I am sure this is not the end of the discussion, but perhaps it is the
beginning.  For a start, by limiting the discussion to my
ISO_VARYING_STRING analogue, we have completely skipped any mention of
structure constructors (since it doesn't expose one).  If it did expose
one, then my analysis is that the problems (and solutions) would be
identical to that of the array constructors outlined above, and for the
same reasons.

Note that the constructor problems are the subject of an interp request,
but the other problems are not.  Yet.

===END===