To: J3                                                     J3/19-109
From:    Van Snyder
Subject: Synchronizing variables
Date: 2019-January-16
References: N2162, 18-115, 18-237, 18-243, 18-246, Parts 6 and 7 of 15-166

Introduction
============

There are numerous restrictions on the interactions concerning access to
variables in different iterations of DO CONCURRENT constructs, and
coarrays in unordered segments.  These restrictions would also apply to
blocks of a fork-join construct or asynchronous constructs (see
18-243).  They could be relaxed or eliminated by a different category of
storage, called I-Structure, invented by Arvind, Rishiyur Nikhil, and
Keshav Pingali, and described in CACM in 1989.

Paper history
=============

Meeting 217 minutes state that /HPC announced on Monday 15 October that
there would be no /HPC action on 18-246; it would instead be considered
by /Data in conjunction with 18-237.  18-237 was rejected, and there was
no further consideration of 18-246, either by /Data or plenary.

Use Cases
=========

1. Reference a variable assigned a value in one iteration of a DO
   CONCURRENT construct, in a different iteration.  This is presently
   prohibited by 11.1.7.4.5p3 and p4.

2. Reference a coarray assigned a value by a different image in an
   unordered segment.  E.g., The work queueing example in 11.6.10 Note 3
   could be implemented without explicit locks.  Without using explicit
   locks, this is prohibited by 11.6.2p3.

3. Reference a variable assigned a value in one block of a fork-join
   construct, from within a different block.

4. Reference a variable assigned a value in one asynchronous construct,
   from within a different asynchronous construct.

5-8. Same things for allocation status, pointer association status.

9-12. Detect that several iterations or images have attempted to assign
   a value or meddle with allocation or association status.

Proposal
========

Define an attribute for a derived type, maybe named SYNC or SIGNAL, that
indicates objects of that type have two states: "Empty" and "Full."  The
attribute would apply to all objects of the type.  Allow to specify the
attribute for objects independently of whether the attribute is
specified for the type.  Allow to specify the attribute for objects of
intrinsic type.

For arrays, allow to specify whether the attribute applies to the whole
array, or to each element separately.

A value can only be assigned to the entirety of an object that has the
attribute, not to subobjects of it separately.  There is no concept of
"partly empty" or "partly full."

Alternatively (or additionally), the functionality could be provided by
an extensible type, defined in ISO_Fortran_Env, that has non-overridable
"magic" assignment and defined input routines that operate on its
extensions.  This is a bit clunky if you need a synchronized object of
intrinsic type because then every reference needs additional syntax to
dereference a component.  It might be made more palatable if the
synchronizer type were accompanied (in the intrinsic module) by
extensions for each of the intrinsic types, with type parameters, and
type-bound intrinsic operations.

If a [subobject of] a variable of a type with that attribute is
referenced when it is empty, execution is suspended until it is assigned
a value.  For coarray or coindexed variable references, this has the
same effect as waiting for a lock.  There is presently no equivalent for
DO CONCURRENT.  If it is referenced by several iterations or several
images, the suspended iterations or images are queued, maybe with a
queue whose head is referenced by a private pointer in the referenced
object, or perhaps aliased to the object itself.  This is especially
difficult to do by waiting for event (or lock) variables if several
images need to wait for the value to be provided.  Again, there is
presently no equivalent for DO CONCURRENT.  This method doesn't need to
declare locks or events explicitly, post to them explicitly (and know to
which ones to post), wait for them explicitly, or know which one to wait
for.  At least for interactions between images, this proposal could be
implemented using events or locks, or perhaps more efficiently "under
the covers" -- but that won't work for DO CONCURRENT iterations because
we put too many restrictions on lock and event variables, i.e., they
have to be coarrays, and an image can't wait for a lock it has locked,
even if a different iteration of a DO CONCURRENT executing on the same
image might unlock it.  Hardware might someday make this work very much
like split-phase memory transactions in Denelcor HEP or MIT Monsoon or
Tera/Cray MTA.  Having the facility in Fortran might put some pressure
on hardware developers to provide it, much as array operations urged
hardware vendors, e.g. Intel, to provide, e.g. SSE extensions.

Reference to an empty non-coarray variable of this type outside a DO
CONCURRENT construct (or asynchronous block -- see 19-243 -- or
procedure -- see 18-237), or to a coarray variable on your own image
(outside a DO CONCURRENT construct), causes deadlock.  As with waiting
for events or locks, more subtle kinds of abuse can also cause deadlock.

If it is assigned a value when it is "Full" an error condition occurs
(exceptions proposed in 18-115, or the ASSIGN statement proposed in
15-166, would be useful here).  For DO CONCURRENT, this is simply
prohibited in different iterations.  When it happens, it causes a race
condition that is not easily detectable.  For coarrays, the rules about
references in unordered segments apply, but the race conditions are just
as real, and difficult.  All that's necessary to cause an inscrutable
non-repeatable problem is to omit one SYNC.  This proposal does the SYNC
automatically.

Such a variable is initially "empty."  There is no concept of "undefined
full status."

Provide a statement or intrinsic subroutine or type-bound procedure to
"empty" it, e.g., before starting a DO CONCURRENT construct.  This is
equivalent to unlocking a lock.

Provide an intrinsic function, or one in an intrinsic module, or a
type-bound one, to inquire whether it is "empty."  Or maybe not, as
somebody might be tempted to spin on it instead of having the processor
queue on it.  There is no intrinsic function to inquire whether a lock
is locked.

This proposal bears a similar relationship to locks and events that
coarrays bear to MPI or PVM.  In both cases, the new facility is much
simpler to use, and more likely to be used correctly.

I-Structure History
===================

Arvind and his grad students implemented I-Structures in software in
GITA (Graph Interpreter for Tagged-token dataflow Architecure -- and his
Arvind's wife's name) to run Id on Unix workstations.  Id is a language
designed for parallel demand-driven dataflow programming.  I-Structures
might also have been used in pH, a confluence of Id and Haskell.

Greg Papadopoulos used I-Structures to implement the Monsoon dataflow
computer nodes, for his Ph.D. project at MIT.  I don't know whether
their semantics were provided by special purpose hardware, extra gates
etc., firmware, or software in the special-purpose variations of
Motorola 88100 processors that Papadopoulos used.  88100 had room for
about 128,000 more transistors than Motorola used for "stock"
processors.  Los Alamos was sufficiently impressed to have bought twelve
Monsoon processors.  When they were retired, at least one was given to
the Computer History Museum.  USC bought one too.  I don't know how many
nodes the Los Alamos or USC processors had.

January 15-16 e-mail discussion of Performance Portability
==========================================================

E-mail discussion 15-16 January discussed the importance of emphasizing
the superior usability for parallel programming inherent in Fortran
coarrays and CONCURRENT constructs.  Automatic synchronization
immediately improves the productivity of parallel program development.
The alternatives to Fortran that are frequently put forward offer
nothing similar to synchronized variables.