To: J3                                                     J3/24-136r1
From: Malcolm Cohen
Subject: DIN-4: Generic processing of assumed-rank objects
Date: 2024-June-24


1. Introduction

DIN-4 suggests
    "Allow generic processing of assumed-rank arguments,
     possibly under suitable restrictions."

That is a very vague requirement.
This paper suggests some concrete possibilities.

Note that this is very much a first draft. Without doubt, it is not a
complete set of proposals. It may, however, be the start of something
small enough that we might be able to do it and get it right in one
revision.


2. Assumed size

The fly in the ointment for most proposals is the possibility that
the assumed-rank object might be associated with an assumed-size array.
There are four obvious solutions. Whatever is done (which might not be
one of these four) should be chosen consistently, for consistency.

    (1) Don't allow anything that does not work for an assumed-size array.
        This is approximately the current state of affairs, and is hardly
        adequate.
    (2) Use the bounds reported by LBOUND and UBOUND for assumed-size.
        That produces a zero extent in the final dimension, so nothing will
        happen in the case of assumed-size. Hello silent wrong answers.
    (3) Raise a runtime error if the "generic processing" is applied to an
        assumed-size array. This has the advantage of avoiding silent wrong
        answers, but the disadvantage of not (in general) being capable of
        being caught and handled by the user program.
    (4) Only permit the generic processing in the RANK DEFAULT block of a
        SELECT RANK that has a RANK(*) block. This forces the user program
        to handle assumed-size specifically. It might be too much of a
        straightjacket though.

Of those obvious solutions, both (3) and (4) seem reasonable.


3. Contexts for additional usage of assumed rank

3.1 Whole-array reductions

It would seem to unproblematic to permit assumed rank in array reduction
intrinsics when there is no DIM argument. The result in this case is
reduced all the way to scalar, so there is no nightmare "variable rank"
expression.

ALL, ANY, REDUCE, SUM, PRODUCT, MAXVAL, MINVAL, IALL, IANY,
IPARITY, PARITY.

Currently all those functions require the argument being reduced to be an
array. It would be highly undesirable for the scalar association case of
assumed rank to make an error, and the result is obvious (it's just the
value of the scalar).

Consistency might suggest permitting scalar all the time, but of course
that would inhibit detection of typos in the usage of such functions; the
consistency argument is weaker than the error-detection argument, so unless
there is another reason for permitting scalar, they should continue to
require arrays.

The location reductions MAXLOC, MINLOC, FINDLOC could also be permitted for
assumed rank. The sizes of their results would be non-constant, but the
rank would be constant.

3.2 Array constructors

An array is permitted in an array constructor, and simply expands to its
elements in array element order. That seems perfectly reasonable for
assumed rank (as usual, except in the assumed-size situation).
On the other hand, it might not be very useful.

3.3 Array transformation intrinsics

The only one here that is clearly non-problematic is RESHAPE.
It does not seem very useful by itself.

3.4 Actual argument for sequence association

This would seem to be unproblematic in all cases. If contiguous, the
processor can just pass the address of the first element, and if non-
contiguous, the processor can do copy-in/copy-out (at the usual cost).

On the other hand, many people prefer to use assumed-shape, as sequence
association requires passing any necessary size/shape info separately.
It doesn't seem like a great idea to have special support for an old-
fashioned (and error-prone) feature and not the more modern ones.

We could, of course, allow passing by assumed-shape simply by adding
size-one extents for missing extents, and collapsing extra extents into
the final extent (with copy-in/copy-out if not contiguous in the
collapsed extents). This does not seem like it is too complicated for
normal use (the user can possibly make this happen himself if he wants).

3.5 Contiguous assumed rank

There are additional contexts where a contiguous assumed rank object could
be used, for example, as the target in a rank-remapped pointer assignment.
Because rank-remapping pointer assignment has all the bounds specified, it
would be unproblematic for assumed-size too.

There has also been a suggestion to allow rank-remapping in ASSOCIATE to
enable this to be done. That seems to be unnecessary given we already have
rank remapping for pointers, and there has been no clamour for it in
ASSOCIATE for things that are not assumed-rank.

3.6 C_LOC

C_LOC of an array returns the address of its first element. There would
thus seem to be no problem permitting it for an assumed rank array, though
it would not be of much use unless the array is contiguous.

3.7 Array elements

A slight extension of the existing rank-independent subscripting syntax
would make it possible to reference elements of an assumed-rank array, even
one that is associated with an assumed-size actual argument. The question
that arises is what happens when the number of subscripts is incorrect?
For insufficient subscripts, some obvious possibilities are
    - error termination
    - the missing subscripts are treated as equal to the lower bound
    - processor-dependent random garbage results or crash,
    - have a pseudo-subscript STAT=, like we do for image indexing.
For too many subscripts, some obvious possibilities are
    - error termination
    - the extra subscripts are ignored
    - segmentation fault crash.

The slight extension is that we need to clearly permit the subscripting
vector to have non-constant size when the object is assumed rank.

3.8 Array sections with triplets

A similar extension is possible here, and similar questions arise when the
user gets it wrong. It would, however, be important not to permit triplet
vectors to be variable size, otherwise we don't know the rank of the
subobject that would be produced.

3.9 Other array sections

These don't seem reasonably possible without falling foul of the "unknown
rank" issue.


4. Assumed rank array traversal

Due to the problem of variable-rank expressions, there is no obvious way to
use elemental procedures on assumed-rank objects, even though the compiler
would know how to traverse them.

Possibilities here would be
    - a DO loop which has an index vector (of unknown size)
      instead of an index variable;
    - a DO association loop which has an associate-name that is associated
      with consecutive array elements on each iteration.

Actually, it would probably be better not to reuse the DO keyword here, as
this is rather a special case, and operates quite differently to the normal
DO.

For example, in casual BNF:
(a)
    TRAVERSE (assumed-rank-object-name) WITH (index-vector-name)
        block
    END TRAVERSE

In the block, index-vector-name would take on successive values such that
it traverses the object in array element order. For example
    TRAVERSE (A) WITH (IDX)
        ... here, A(@IDX) is the array element for this iteration.

I think we'd want IDX to be a construct entity of rank 1, size equal to
RANK(A), and of type INTEGER with a processor-dependent kind not less than
default integer kind. Or we could allow an integer-type-spec in front of
IDX, e.g.
    TRAVERSE (A) WITH ( [ integer-type-spec :: ] IDX )
but it may be cleaner just to require the processor to automatically make
the kind big enough to hold any subscript value.

(b)
    TRAVERSE (assumed-rank-object-name) ASSOCIATE (element-name)
        block
    END TRAVERSE

In the block, element-name would be associated with each successive element
in the iteration, e.g.
    TRAVERSE (A) ASSOCIATE (ELT)
        ... here, ELT is the array element for this iteration

This avoids messing around with index values, needing to know what integer
kind they should be, etc. If one wants to mess around with index values,
e.g. to do a neighbourhood computation, perhaps one should be constructing
and updating the index values manually anyway.


5. Conclusions

Rank-remapping pointer assignment would seem to be easy and lacking in
problems (even assumed-size).

After that, array reduction intrinsics without DIM= have no problems
other than the assumed-size one, and so would seem to be worth considering.

Although there are even more potential problems, the more general array
element subscripting and array traversal operations are also worth further
consideration.

===END===