To: J3 J3/19-174
From: Van Snyder
Subject: Rank-agnostic array element and section denotation
Date: 2019-July-05
References: 18-247 19-110r1 19-150 19-170 19-173
Background
==========
19-150 was a successor to 19-110r1, which proposed rank-agnostic syntax
to declare arrays and reference their elements or sections. It did not
address rank-agnostic array declaration; that is the subject of 19-173.
19-150 unnecessarily restricted the functionality proposed in 19-110r1,
and extended the syntax proposed in 19-110r1 in an unnecessary direction.
19-150 claims that new syntax is necessary to avoid a syntax ambiguity.
19-110r1 proposed to allow an array A of rank k to have a single
subscript B of rank n with shape [ k, e1, e2, ..., en ]. Then
RANK(A(B)) = RANK(B) - 1, and SHAPE(A(B)) = [ e1, e2, ..., en ].
This is clearly useful, as illustrated in 19-110r1.
19-150 proposes that RANK(B) == 1 only, i.e., RANK(B) > 1 is prohibited.
In A(B), if RANK(A) == 1 and SIZE(B) == 1 then A(B) is currently
interpreted as a vector-subscripted array reference, in which
RANK(A(B)) = 1, not zero, and SHAPE(A(B)) == [1].
So as not to conflict with the existing specification for a vector
subscripts, if RANK(A) == 1, RANK(B) == 1, and A(B) is a rank-agnostic
subscript, the result of A(B) would necessarily be defined to be one
instead of zero.
In A(B), the case with RANK(A) == 1, RANK(B) == 1, and SIZE(B) == 1 is
a degenerate case of the proposal in 19-110r1, wherein it was proposed
that RANK(A(B)) == RANK(B) - 1. I.e., in the case RANK(B) = 1, A(B) is
a scalar, not an array of shape [1].
Although potentially confusing, the irregularity of defining
RANK(A(B)) == 1 rather than zero if with RANK(A) == 1, RANK(B) == 1, and
SIZE(B) == 1, is preferable to requiring A@(B), as proposed in 19-150 to
indicate that in this special case the result is a scalar.
Requiring A@(B) for RANK(A) > 1 or RANK(B) > 1 is undesirable because
the syntax is different from a function reference. If, later in the
lifetime of a program, it becomes necessary to replace the array A with
an abstraction realized by procedures, changes to the syntax of
reference would be necessary, unless it is allowed to reference a
function (and updater -- see 19-170) using A@(B), wherein "@" has no
effect.
Instead, A@(B) should be optional. In the case when RANK(A) == 1,
RANK(B) == 1, SIZE(B) == 1, and SIZE(B) is a constant expression, it
indicates the result is to be a scalar. That is,
RANK(A(B)) == RANK(B)-1 if RANK(B) > 1
RANK(A(B)) == 1 if RANK(A) == 1 and RANK(B) == 1
RANK(A@(B)) == RANK(B)-1 if RANK(B) > 1
RANK(A@(B)) == 0 if RANK(B) == 1, SIZE(B) == 1, and SIZE(B) is a
constant expression
RANK(A@(B)) == 1 if RANK(B) == 1, and SIZE(B) > 1 or SIZE(B) is not a
constant expression
19-150 proposed to allow more than one rank-agnostic subscript, e.g.
A @( V1, ::2, V2 )
in which V1 and V2 are rank-one subscripts whose extents are constant,
such that SIZE(V1)+SIZE(V2)+1 == RANK(A). If RANK(A) == 3 and the
shapes of V1 and V2 are both [1], without the additional syntax, the
rank and shape of the result is ambiguous. If A( V1, ::2, V2 ) is
interpreted to have vector subscripts, the result is a rank-3 array,
with shape [1, n, 1] where n = ( SIZE(A, SIZE(V1,1)+1 ) + 1 ) / 2.
If A( V1, ::2, V2 ) is interpreted to have rank-agnostic subscripts, the
result would be considered to be a rank-one array whose shape is [n].
The ambiguity could again be resolved using A@( V1, ::2, V2 ) to
indicate a rank-one result.
Th syntax A @( V1, ::2, V2 ) is, however, not necessary, as it is
equivalent to allowing A(V) where RANK(V) > 1, and
V = reshape( [ ( V1, i, V2 , i = 1, size(A,size(V1,1)+1), 2 ) ], &
& [ rank(A), ( size(A,size(V1,1)+1) + 1 )/2 ] )
There is no problem to allow
A ( V1, V2, ..., Vm ) or A @( V1, V2, ..., Vm )
wherein RANK(V1) == RANK(V2) == ... == RANK(Vm) > 1, provided
SUM(SIZE(V[i],1),i=1:m) == RANK(A), and all other dimensions of V1, V2,
..., Vm are the same, viz., the remaining part of their shapes is
[ e1, e2, ..., en ]. This is difficult to explain and again,
unnecessary, as it can be expressed as A(V), where V is composed of V1,
V2, ..., Vm. The result is a rank n-1 entity having shape
[ e1, e2, ..., en ].
Assume A is a rank-5 array,
[ 1 2 3 ] [ 2 3 4 ]
V1 = [ ] and V2 = [ 6 1 2 ] .
[ 4 5 6 ] [ 1 7 5 ]
Then A(V1,V2) has shape [3] (because the second extents of V1 and V2
are [3]). Its elements consist of
[ A(1,4,2,6,1), A(2,5,3,1,7), A(3,6,4,2,5) ]
but it's a variable, not an expression. This is, however, the same as
A(V) where
V = reshape( [ ( V1(:,i), V2(:,i), i = 1, 3 ) ], [ 5,3 ] ), or
[ 1 2 3 ]
[ 4 5 6 ]
V = [ 2 3 4 ] .
[ 6 1 2 ]
[ 1 7 5 ]
Therefore, allowing A ( V1, V2, ..., Vm ) or A @( V1, V2, ..., Vm ) is
not necessary.
Assume A is a rank-5 array with third extent 1:3, and
[ 1 2 3 ] [ 2 3 4 ]
V1 = [ ] and V3 = [ ]
[ 4 5 6 ] [ 1 7 5 ]
then A(V1,::2,V3) has shape [2,3] and its elements consist of
[ A(1,4,1,2,1), A(2,5,1,4,7), A(3,6,1,4,5 ]
[ A(1,4,3,2,1), A(2,5,3,4,7), A(3,6,3,4,5 ]
Alternatively, its shape might be defined to be [3,2], with elements
[ A(1,4,1,2,1), A(1,4,3,2,1) ]
[ A(2,5,1,4,7), A(2,5,3,4,7) ]
[ A(3,6,1,4,5), A(3,6,3,4,5) ]
Some care would be necessary in the description, including to define the
array-element order, because this is not the same as "spreading" "::2"
to an array
[ 1 1 1 ]
V2 = [ ]
[ 3 3 3 ]
and making the reference equivalent to A(V1,V2,V3), which would require
A to be a rank-6 array. This problem is, again, unnecessary.
Defining A(V1,::2,V3) in terms of a single rank-agnostic subscript V
would require V to have shape [5,3,2] or [5,2,3], i.e.,
V = reshape( [ ( ( V1(:,i), j, V2(:,i) , i = 1, 3 ), j = 1, 3, 2 ) ], &
& [ 5, 3, 2 ] )
or
V = reshape( [ ( ( V1(:,i), j, V2(:,i) , j = 1, 3, 2 ), i = 1, 3 ) ], &
& [ 5, 2, 3 ] )
If A(V) were allowed with RANK(V)>1, allowing A(V1,::2,V3) is not
necessary, and there would be no question whether the result has shape
[2,3] or [3,2], as the shape would be given by the dimensions of V,
after than the first one. This eliminates the question of array-element
order in A(V1,::2,V3) if RANK(V1) = RANK(V2) > 1.
The formulation based upon a single array could be in an example, or
Annex C.
Proposals
=========
Allow A(B), wherein SHAPE(B) = [ K, e1, e2, ... en] and K is a constant
expression equal to the rank of A, to specify an array of rank
max(1,n). If n > 0, SHAPE(A(B)) = [e1, e2, ... en]. If n == 0,
SHAPE(A(B)) == [1]. Each extent-K section in the first dimension of B
is used as a set of subscripts for A. E.g., from 18-247:
Suppose we have arrays A with dimensions (10,10,10) and B with
dimensions (3,2). If we assume
[ 3 6 ]
B = reshape( [3, 4, 5, 6, 7, 8], [3,2] ) = [ 4 7 ]
[ 5 8 ]
then A(B) is a rank-1 extent-2 array that can appear in a variable-
definition context (except as an actual argument associated with a dummy
argument having INTENT(OUT) or INTENT(INOUT)); it specifies the same
array as [ A(3,4,5), A(6,7,8) ], which cannot appear in a
variable-definition context. This is different from
A(B(1,:),B(2,:),B(3,:)), which can appear in a variable-definition
context, but is an object with shape [2,2,2], not [2]. The former is an
arbitrary collection of elements of A, while the latter is a
rectangular section of A.
If RANK(B) == 1, the specification that SHAPE(A(B)) == [1] avoids a rank
ambiguity compared to vector subscripts. According to R920, a section
subscript is a vector subscript. According to C927, a vector subscript
shall be an integer expression of rank one. According to 9.5.3.3.1,
A(B) is an array section. According to 9.4.2p2 and 9,5,3,3,1p2, the rank
of a is the number of ... vector subscripts in the list.
Therefore, if A(B) in which RANK(B) == 1 and SHAPE(B) == 1 is considered
to be a rank-agnostic subscript expression, it cannot have rank zero,
i.e., it must be an array.
Allow A@(B), wherein SHAPE(B) = [ K, e1, e2, ... en] and K is a constant
expression equal to the rank of A, to specify an array of rank n.
I.e., If n = 0, A@(B) specifies a scalar. Each extent-K section in the
first dimension of B is used as a set of subscripts for A.
Do not bother with allowing and trying to explain the array-element
order in denotations of the form
A ( V1, ::2, V2 )
A @( V1, ::2, V2 )
wherein V1 or V2 can be arrays, because these can be posed as
expressions of the form A(B).
Allow a function to be referenced with "@" before its argument list, in
which "@" has no effect.
If updaters are provided, e.g., as described in 19-170, allow an updater
to be reference with "@" before its argument list, in which "@" has no
effect.
All of the prohibitions related to vector subscripts apply to the
present proposal. If vector subscripts are described as having two
variants -- one as presently described in 1539-1, and the other as
described here, the tedium of finding every occurrence of "vector
subscript" and replacing it with "vector subscript or "
would be avoided.