J3/01-119 To: J3 From: Craig Dedo Date: February 9, 2001 Subject: Design Considerations for Stream I/O Analysis The two unresolved issues about stream I/O, 127 and 128, suggest that there may be some serious design defects, not just some editorial fixes to be done. This paper discusses these design considerations and attempts to construct a solid foundation for resolving these two issues. It will attempt to provide a rigorous model using a general approach starting with first principles. Although many of these ideas are well known and some of them are already part of the normative text of Fortran 2000, I believe that it is useful to present them in as well organized and coherent a manner as possible. It is my position that stream I/O (and all other I/O access methods for that matter) are intimately connected with the host operating system and file system and must take file system properties into full consideration in order to work well. The original purpose of stream I/O was to allow users of Fortran to access C-style "byte stream" files or, alternatively, files that have no internal record structure. This is explicitly stated in the two WG5 work items, 63 and 63a, proposing stream I/O. Following are some design principles and assumptions that form the foundation of this analysis. 1. Implementation details should be left to the processor. 2. We should design for all commercially significant operating systems and file systems. 3. Fortran compilers should work well on a wide variety of operating systems and file systems. No one operating system or file system should dominate the design of Fortran. 4. Fortran compilers should honor the standards and conventions of the host operating system and file system(s). If the operating system and file system are silent on an issue, then the Fortran compiler is free to do as it pleases. 5. Operating systems can support more than one file system, often simultaneously on the same system. A good example is Microsoft Windows NT, which can have some disk volumes with the FAT file system and other volumes with the NTFS file system on the same system at the same time. 6. No concept of a file is universal, even though some concepts are very widespread. 7. We do not know what file systems will dominate computing 10 to 20 years from now. There is no guarantee or even a high likelihood that the file systems which are predominant today will continue to dominate computing. 8. In the next 10 to 20 years, we may have commercially significant installable file systems that are designed by parties other than the vendor of the operating system, such as commercial third parties or even by the user through development kits. 9. Stream I/O in Fortran is an access method, not some other kind of file attribute. This is a correct design decision. We should rigorously distinguish the concepts of access method, record structure (a.k.a. record type), and data format. Although these three concepts are closely related, they really are independent concepts that can be clearly distinguished. An access method is the way that the program finds the data in a file. Previous versions of Fortran allowed only two access methods, sequential and direct. J3 is now adding stream access to Fortran 2000. A record structure, or record type, is the way that records are organized and marked off from one another. A file system may support more than one record structure. For example, a file system may support variable-length records, fixed-length records and stream records. Data format specifies whether the data is read and written using formatted input/output statements or is unformatted. It may be possible to access a file of a given record structure in more than one way. This possibility is explicitly anticipated in the normative text of the Fortran 2000 draft [161:18-21, 172:8-11]. File systems can vary widely in complexity and internal structure. At one extreme is the Unix-style concept of a file, "A file is nothing more than a stream of bytes.". At the other extreme is the OpenVMS RMS (for Record Management Services) file system, which has a very complex internal file structure and record structure. It may be useful to draw an analogy. One could consider file systems to be strongly or weakly typed, just like data types in programming languages. In weakly typed languages, data objects are given a data type, but it is relatively easy to look at the data in an object as if it were of another data type without generating an error condition. In contrast, in strongly typed languages, considering a data object to be something other than the data type it was declared to be is an error and generates an error condition. Similarly, file systems can be classified as weakly typed if the data in the records can be accessed as if it had two or more record structures, e.g., by variable-length records or by stream. There is little or no difference in the record structure. In contrast, in a strongly typed file system, a given record structure is carefully defined and differentiated from other record structures. If a file is created with one kind of record structure and then there is an attempt to access it as if it had a different kind of record structure, an error occurs. If stream I/O is to work properly in Fortran, it must work equally well on both weakly typed and strongly typed file systems. This means that any characteristic or attribute of a file system that varies from one file system to another needs to be left to the operating system and file system. Hence, any issue which is concerned about such file system attributes is necessarily processor-dependent. If a file system recognizes more than one internal structure, it may be allowable to read the data in an existing file with a given record structure only by using one particular access method or by using more than one access method. If more than one access method is allowed, the processor may be able to detect the file's record structure or the user may need to specify which record structure is in use through means not specified in the Fortran standard. An example of the latter method of detection would be to use one or more nonstandard I/O keywords which specify the internal file and record structure. The same problems also exist when creating a file or writing to an existing file. Different access methods may, by default, create files with different internal file or record structures. If the host operating system or file system allows the Fortran compiler to use an access method to write to more than one kind of file or record structure, then the Fortran compiler must have some way of determining which structure to use. Here is an example. Consider a hypothetical file system that has at least two different record structures, variable length sequential records and stream records. It is possible for the file system to support all four of the following combinations: Access Method Record Structure Sequential Variable Length Sequential Sequential Stream Stream Variable Length Sequential Stream Stream There is no requirement in the Fortran 2000 draft that a Fortran processor using such a file system needs to support all four combinations. A related issue is how file systems mark the end of a record (EOR). There are many different ways of doing this with operating systems and file systems that are commercially important today. The following table lists the methods used by several file systems today. File System EOR Method MacOS Microsoft Windows FAT Microsoft Windows NTFS Unix VMS RMS Depends on record structure (record type) Issues Here is the full text of the two issues related to stream I/O. Issue 127: I'm not convinced that end-of-file conditions are fully covered for formatted streams. Note that there is no endfile record in a formatted stream (and I doubt we want there to be one). A strict reading of the 2nd sentence of 9.2.3.2 would tell me that it didn't apply because the endfile wasn't a result of reading an endfile record, but that's subtle. I'd suggest explicitly adding something about sequential; didn't do that myself in case someone thinks that this should apply. What happens when reading a partial record at the end of a file? We say that there may be partial records, but I don't see where we ever say what the effects of such a thing are. If there is a partial record at the end of a file, is it possible to position after it so that it is the previous record? Should, perhaps, reading past the end of a partial record be an error instead of an EOF or EOR condition? Does padding apply to partial records? Some of these questions are probably best answered elsewhere than in 9.2.3.2, but I'll lump them all into one J3 note. Issue 128: The words in 10.5.3 about linefeeds in A output imply to me that a nonadvancing formatted stream output statement that writes a linefeed as the last character in a stream file will cause there to be an empty and incomplete record at the end of the file. Is this empty incomplete record supposed to be distinguishable from having no record? If so, I wonder how Unix-like systems are supposed to distinguish it. If not, I wonder whether we have it described correctly. Same with / editing, where this was just copied from. These holes leave me unconvinced that the description of record handling "just works" with formatted stream I/O. This related to unresolved issue 127 about handling of incomplete records. References 01-007, Fortran 2000 Draft 98-209r2, Specs and Syntax for M.25, Stream I/O 98-211r2, Edits for M.25, Stream I/O 99-110r1, Stream I/O - Suggested Changes (Unresolved Issue 68) 01-102, Changes to List of Unresolved Issues 01-139, Issue 127 - End-of-File in Formatted Stream Files 01-140, Issue 128 - Empty Incomplete Record Compaq Computer Corporation, Guide to OpenVMS File Applications, Chapter 2, "Choosing a File Organization" (Web site: www.openvms.compaq.com:8000/72final/4506/4506_pro) [End of J3 / 01-119]