J3/98-209R2

Date:        11 Nov 1998
To:          J3
From:        R. Maine
Subject:     Specs and  Syntax for M.25, Stream I/O

Significant changes from the R0:
  1. Random positioning feature is part of base requirement.
  2. Added formatted stream I/O.
  3. Special compability rules for character type.
  4. File storage unit recommended (but not required) to be 8 bits
  5. Reading data with a diferent type than it was writen gives
     processor-dependent result instead of being illegal.
     (I.e. its much like a TRANSFER).

I. BACKGROUND

Stream I/O is item M.25 on the f2k work plan.  It is the only item
on the work plan that has not been addressed in some manner.  The
rationale for this item is given in items 63 and 63a of the wg5
repository, N1189.

I consider this item to be of importance both in itself and also
as a component of C interopability.  Other work on C
interopability has focussed on interoperating with C code, but
interoperating with C data files is also an important item.  It
would not be particularly convenient to tell the users that in
order to work with C files, they need to write C code to do so and
then call that C code using the other C interopability features.
Furthermore, as mentioned in the quoted rationales below,
byte-stream files have become a de-facto standard far beyond the
direct scope of the C environment.

From item 63:

| Rationale: C-style "byte stream" has become a de facto standard
| far beyond the direct scope of the C environment. In a
| scientific application it is not surprising to have a sensor
| feeding a stream of data to a processor which in turn feeds the
| results over a heterogeneous network for additional processing.
| Fortran record structure provides the user with an obstacle to
| overcome in this scenario (the processors may not have the same
| record conventions (even when the CPU architecture is the same),
| etc.)

And from item 63a:

| there is a category of files that are definitely not record
| oriented.  This category is called "binary stream files". These
| files are merely constituted of a continuous sequence of storage
| units, without any internal structure. Stream files are
| prevalent in many operating systems such as Unix, DOS, Windows
| and OS/2. Also, there are "industry-standard" file formats that
| are not record oriented, such as GIF and TIFF formats for
| digital images.
|
| Accessing stream files with standard Fortran I/O facilities is
| often difficult: unformatted sequential access may fail because
| the file contains no record delimiters. Using unformatted direct
| access is also awkward since the data cannot be accessed easily
| with fixed record lengths. In short, a new file access is
| needed.

I consider this work item to be of far higher importance than
might be inferred by just looking at its current position in the
work plan.  It is also not a particularly difficult item to do,
either in terms of standards work or implementation.  The impact
on the standard is fairly localized, and many implementations
already do something simillar as an extension.

II. SPECIFICATIONS

Considering the late date, I believe that a fairly minimalist
approach to this work item is appropriate.  WG5 item 63 mentions
stream versions of all combinations of formatted/unformatted and
direct/sequential, with a new specifier on the OPEN statement.
WG5 item 63a restricts itself to unformatted i/o and adds stream
as a new kind of access instead of as a new specifier.  The curent
paper proposes something more along the line of item 63a, with a
new kind of access.  Both formatted and unformatted I/O statements
are allowed on a stream file, but there is a common underlying
file model.

I earlier considered specifying the new model with the form
keyword, which has some precedent.  But this did not seem to
integrate as well as one would like.  It raises questions about
how to interpret the access keyword when the form is specified to
be stream.  On reconsideration, I am proposing that the new model
be specified with the access keyword as an alternative to
sequential or direct.  This appears to integrate far better.
Indeed, much of what the standard already says about the data in
unformatted files is the same stuff that needs to be said about
these new files.  It is only in matters of record structure that
they should much differ.

Detailed Specification: only new keywords and options in OPEN,
READ, WRITE, and INQUIRE are needed.

A stream file consists of a sequence of storage units.  The
storage units are numbered from 1 to n.  Two concepts are present
in a stream file. A file position pointer is used to locate the
next storage unit to be read or written.

The storage unit terminology will also be used in the description
of unformatted direct and sequential access files.  The f95
standard defines a concept for the unit of measure for these kinds
of files, but it does not give a name to that concept.

It is recommended, but not required, that the file storage unit be
8 bits.  The recommendation would likely provide substantial user
pressure for implementations to follow the formal recommendation.
But by making it a recommendation instead of a requirement, those
processors that have suficiently good reason to do otherwise may
do so and still claim conformance with the standard.  (The
definition of "sufficiently good reason" is purely a matter
between the vendor and their users - the standard would not get
into such a question or even use that terminology).  ISO
specifically provides for recommendations that are not
requirements; it is just a provision that we have not made much
previous use of.

Opening a stream file could be done by simply adding
ACCESS='STREAM' in the OPEN statement. The POSITION specifier is
valid for these files and is interpreted identically to with
sequential files.

A file opened for stream access is considered to be connected for both
formatted and unformatted i/o.  Both formatted and unformatted I/O
statements may be intermixed on the same connection.  It is not
allowed to specify a form= specifier in an open with access="STREAM".

Stream I/O shall work in an hybrid fashion between sequential and
direct access. Sequential access shall be done by using the
syntax of sequential READ and WRITE, except that the unit is
connected to a stream file. The file position pointer is moved by
the amount of data storage units transferred by each READ or
WRITE statement executed.  Random access shall be provided by
adding a POS=location specifier to the READ or WRITE
statements. Mixed access shall be allowed for the same unit.
When a WRITE statement overwrites a portion of a stream file,
only the amount of storage tranferred shall replace the existing
locations; the remaining storage units shall remain intact (in
the contrary of conventional sequential WRITEs).

There are no record boundaries in stream files, so all references
to records in formatted i/o do nothing.  It is allowed, for
example, to have a "/" in a format used with stream I/O, but it
has no effect.  It is also alowed to specify advance="yes", but
that likewise has no effect.  A user may choose to explicitly
write such things as linefeed characters to the file, but they are
given no special interpretation while the file is connected for
stream acess.  T edit descriptor are allowed, but as with
non-advancing I/O, the tab positions are relative to the start of
the current i/o statement.

List-directed and namelist formatting are also allowed, with all
references to record boundaries ignored.  Namelist comments are
not allowed because they are inherently tied to record boundaries.

READ and WRITE statement with POS specifiers but with an empty I/O
list merely move the pointer inside the file. In the case of a WRITE
statement, if the position pointer is moved with the POS specifier
beyond the end-of-file marker, the gap is filled with unitialized
data.

The BACKSPACE statement shall be disallowed for such files, since
there are no record delimiters.

The ENDFILE statement is used to truncate the stream file at the
current file pointer position.  An explicit ENDFILE is the only
way to truncate a stream file.  There is no implicit ENDFILE
on close or elsewhere.

New specifiers shall be added to the INQUIRE statement.  In
particular, a CURRPOS specifier to obtain the current position of
the pointer, and a FILESIZE specifier that returns the size of
the file in file storage units. The ACCESS specifer shall also be
extended to allow ACCESS='STREAM' to be returned.

The number of file storage units used by a formatted stream
I/O statement is the same as would be used for unformatted
I/O of a character string with the same number of characters.
Indeed, it is allowed to write a set of storage units with
a formatted stream i/o write and then subsequently read those
units with an unformatted stream i/o read of a character
variable of default kind (or conversely).

Likewise, it is allowed, for example, to write a character*10
variable with an unformatted stream write and then to subsequently
read those file storage units with an array of 10 character*1's.
This is an exception to the general requirement for compatability
of type and type parameters.  Note that there is already a simillar
exception for default character procedure arguments.

It is also proposed that this compatability exception for character
type be applied to unformatted sequential and direct access
files as well as stream ones.

The current standard says that it is illegal to read from an
unformatted file using a data type or type parameters that are
different from the type and type parameters used in the write.  It
is proposed that this prohibition be changed to say that the
resulting values are processor-dependent.  (In essence, it would
be like a TRANSFER intrinsic).  This is in accord with likely
user expectations.  And it alows, for example, things like reading
in a block of data as an array of one type and then using
TRANSFER to extract parts of it as diferent types; this kind
of functionality is a common requirement for cases where the
types of individual words may not be known until after the
block is read in.  This change would also apply to sequential
and direct access unformatted files for consistency and to
conform to common expectations and existing practice.

And one additional spec peripherally related: Delete the
prohibition against namelist with internal I/O.  Allowing internal
namelist I/O will help facilitate getting the effect of formatted
stream I/O by using unformatted stream I/O in conjunction with
internal I/O.  But this is not critical and can be dropped if
there is objection.  And even if we allow formatted stream I/O, it
will improve consistency to also allow list-directed and namelist
internal I/O.

III. SYNTAX

Much of the syntax follows fairly obviously from the specifications,
with possible minor quibbles about spelling.

A. The OPEN statement.

   ACCESS='STREAM' is allowed in the OPEN Statement.  In such cases,
   the file is open for both formated and unformatted I/O.  It is
   not allowed to specify a FORM=.

   RECL is not allowed.

   ASYNC is allowed.

B. The CLOSE, WAIT, REWIND, and ENDFILE statements

   No syntax changes.  EOR is allowed in WAIT, but will never happen.
   (We already allow it for other cases where it can't happen).

   The specs describe the different interpretation of ENDFILE.

   Take out the prohibiton against namelist on internal files.

C. BACKSPACE is disallowed.

D. The PRINT statement and the form of READ without an
   io-control-spec-list are disallowed  (because they refer only
   to standard in/out, which are sequential formatted).

D. The READ and WRITE statements.

   Identical in syntax to other READ/WRITE statements.

   Can not have REC=.

   May have ADVANCE= and EOR= but they have no effect

   May have ASYNC= and ID=.

   Add a new POS=scalar-int-expr specifier allowed only for stream files.
   If this is specified, the file is positioned to the specified position
   prior to the data transfer.

E. The INQUIRE statement.

   May return 'STREAM' as a value for access.

   Add a STREAM= specifier just like SEQUENTIAL= and DIRECT=.  (It is
   possible that a processor might not allow stream access to all files).

   Add a SIZE=scalar-default-int-variable specifier that returns the
   file size in the same units as used for REC=.  (The wg5 item
   suggested an example spelling FILESIZE, but I don't think that
   necessary, though I'd accept it if the majority prefers; file_size
   would be another obvious alternative along that line).  This
   returns a value of -1 if the file size cannot be determined (for
   example, if the file is a device instead of a disk file).  SIZE=
   may also be used for sequential and direct access files; in those
   cases, the file size might not be the same as the amount of data
   written to the file (i.e. the processor can return the actual file
   size; it doesn't have to do anything like keep track of how much of
   the size is user data versus how much is record headers).

   Add a POS=scalar-default-int-variable specifier that returns the
   current file position.  This returns -1 if the position cannot be
   determined.  The result of POS= is undefined for sequential and
   direct files.  The wg5 item used the example spelling CURRPOS, but
   I find that a bit awkward.  I think it better to use the same
   spelling for the specifier in the INQUIRE statement as the one in
   the READ/WRITE statements.

F. Derived type I/O.

   Uses the derived type I/O routines with no changes.
   (Those routines already look almost more like stream I/O than
   record-oriented anyway - within the DTIO routine you don't get
   any file positioning before or after a read or write).