11-175
To: J3
From: Nick Maclaren
Subject: Interop TR: CFI_section and strides
Date: 2011 June 09
Reference: N1854


I don't like one aspect of CFI_section at all, but cannot find any
paper explaining why the decision was taken the way that it was.
However, I need to describe another topic first.


Background
----------

I was thoroughly confused by the TRANSPOSE and RESHAPE intrinsics for a
long time, and still think that the standard is unclear.  I had
incorrectly assumed that they delivered an association, in the same way
that a section does, and not a free-standing value.  I still think that
it was a mistake not to create such a class of intrinsic, but that
decision was taken 20 years ago.

Another part of the problem is that it IS possible to create an array
section that is associated with the diagonal of a contiguous square
array, and do similar reshaping (including increasing the rank of an
array), even in pure Fortran.  All you have to do is to use either
sequence or pointer association, though both need contiguity.  Also,
being able to create diagonal sections was shown to be straightforward
in Algol68, was in earlier drafts of Fortran 90, and its omission was
regretted by several people.

A related part is that it is a long-standing requirement to do those,
and similar manipulations, more cleanly, and without the constraint
of contiguity when it is unnecessary.  Most FFT codes rely on this,
and it is often why they are stuck in the Fortran 77 era.

Lastly, one otherwise trivial problem that we punted on previously (in
this TR) is that the constraints on stride consistency (10:26-29) are
needed in most cases, but aren't always for extent sizes of 0 and 1.
Hard cases make bad law, and I don't propose complicating the wording,
but it does affect this function.


Current Specification
---------------------

The current specification of CFI_section provides exactly the right tool
to create array diagonals, and then forbids doing that, as an apparently
gratuitous restriction.  A lot of people will use it to do such
reshaping, no matter what this TR says, find that it works on one
system, and then complain when it doesn't on another.

Also, it is very error-prone to use and there is currently one omission
in the wording and one lack of clarity.  The omission is that it does
not say anything about the sm member of elements with member
extent = -1.

The lack of clarity is that the constraints on descriptor consistency
apply to the result descriptor and not to the argument, and C
programmers will quite correctly realise that the sm member is not used
when the extent size is 0 or 1.  That will definitely cause confusion
over when CFI_INVALID_SM must be returned, must not be returned and may
be returned.

Some attention is needed to sort these issues out, whatever is decided.


Possibility 1
-------------

We could relax the current wording to allow the creation of valid views
of an array that are not possible in Fortran, but would be 'safe'.  I
originally started with that opinion, but have come round to the view
that it is not a desirable path to follow in this TR.  Its implications
are just too fundamental.

For example, we could provide a proper CFI_reshape function, either as
an alternative or a replacement, that would look a bit like this:

    int CFI_reshape ( CFI_cdesc_t * result, const CFI_cdesc_t * source,
        CFI_attribute_t attribute, CFI_rank_t rank,
        const CFI dim t dim[] );

This would establish a reshaping of the source, subject only to the the
resulting descriptor being valid and all elements of the result being
elements of the source.  The wording for this would be slightly simpler
than that for the current CFI_section, as there would be no special
interpretation of the dim argument.

This possibility may have been considered in Las Vegas, but it is hard
to tell.  I feel that it needs explicit consideration, but would not
favour adopting it, because I do not know how much it would impact on
existing implementations.


Possibility 2
-------------

We could make no technical change and just fix the omission and lack of
clarity.  The wording is little shorter because we need to put more
effort into the CFI_dim_t consistency wording.

This approach would seem to leave the door open to future extensions,
but they would be at most things like producing diagonals.  CFI_section
could not be extended to a general reshaping function, and I believe
that reshaping would be better done by a proper CFI_reshape in any case.

I do not favour it, because it is nearly as error-prone as a general
reshape, only marginally more flexible than Fortran sections, and
messier to specify than either.


Possibility 3
-------------

We could reduce the opportunity to create improper arrays and increase
the safety of this call, by making the sm members relative and not
absolute, exactly as in Fortran sections.  Note that the argument that a
value of type CFI_dim_t is being treated anomalously doesn't hold water,
because that has to be done even under the existing approach!

I believe that this will be less error-prone in use, and offers
considerably less opportunity for C programmers to shoot their feet off
while claiming they are just doing what the TR allows.  As far as I can
see, its functionality is identical.

This document is proposing this approach, in one of two forms, with a
mention of a third.  The first is a minimal change, and the second
splits the dim argument up, on the grounds that it is not being used as
a normal CFI_dim_t value (even at present).  The lower bounds, extents
and strides are then specified as separate arguments, and each may be
NULL, which adds a useful convenience.  The forms are otherwise
functionally equivalent.


Edits to N1854:
---------------

Alternative A:
-------------

[17:24] Delete "of dim" and, after "value of -1", append "and the
corresponding sm member is ignored".  Also append "The other sm members
of dim are treated as relative strides and are multiplied by the
corresponding sm members of source to produce the sm members of result."

[18:1] Delete "*source->dim[0].sm".


Alternative B:
-------------

[17:9] Replace "const CFI_dim_t dim[]" by "const CFI_index_t
lower_bounds[], const CFI_index_t upper_bounds[], const CFI_index_t
strides[]".

[17:19-24] Replace the paragraph specifying the dim argument by:

"lower_bounds points to an array specifying the subscripts of the
element in the given array that is the first element of the array
section.  If it is NULL, the first element of source is used.

"upper_bounds points to an array specifying the subscripts of the
element in the given array that is the last element of the array
section.  If it is NULL, the last element of source is used.

strides points to an array specifying the strides of the array section
in units of the sm member of the dim member of argument source; if an
element is 0, the corresponding dimension is a subscription and the
corresponding elements of lower_bounds and upper_bounds shall be equal."
If it is NULL, the strides are treated as all being 1."

[17:27-28] Replace "the number of dim entries for which the extent
member is equal to -1" by "the number of stride elements which are equal
to 0".

[17:34-39] and [18:1-2] Replace the example by:

"the following code fragment establishes a C descriptor for the array
section A(3::5).

    CFI_index_t lower_bounds[] = {2}, strides[] = {5};
    CFI_CDESC_T(1) section;
    int ind;
    ind = CFI_section ( (CFI_cdesc_t *) &section, source,
        CFI_attribute_assumed, lower_bounds, NULL, strides );

If source already points to a C descriptor for the rank-two Fortran
array A declared as

    real A(100,100)

the following code fragment establishes a C descriptor for the rank-one
array section A(:,42).

    CFI_index_t lower_bounds[] = {source->dim[0].lower_bound,41},
        upper_bounds[] = {source->dim[0].upper_bound,41},
        strides[] = {1,0};
    CFI_CDESC_T(1) section;
    int ind;
    ind = CFI_section ( (CFI_cdesc_t *) &section, source,
        CFI_attribute_assumed, lower_bounds, upper_bounds,
        strides );
"


{{{ Optionally:

    1) We could ignore the upper bound element if the stride is zero,
rather than requiring it to be set.

    2) I do not think that we need the following any longer, but it
could easily be added if people prefer:

[17:28] Append to the end of the paragraph "The argument result shall be
such that it specifies an array that could have been obtained by
associating the source argument with a Fortran assumed-shape array and
applying array section notation in Fortran." }}}


Alternative C:
-------------

We could restore the CFI_bounds_t type, and proceed as in alternative B,
but by putting the wording in different places.  I do not think that it
is worthwhile for a single function.