X3J3/96-061 Date: April 3, 1996 To: X3J3 From: Richard Bleikamp Subject: Derived Type I/O A previous version of this paper was emailed to the x3j3 alias. This revision incorporates some of the comments from Richard Maine and Baker Kearfott. I am still looking for alternative approaches to providing the desired functionality. Please send me any suggestions for references/approaches. - The approach described here is somewhat similar to what Ada users might expect, if we chose to overload a small set of function names. A C++ user would also see similarities. Neither C++ nor Ada provide any implicitly defined support for formatted derived type I/O. As with the Async I/O paper presented at the last meeting, this paper is intended primarily to generate discussion on this topic, and to serve as part of a tutorial on derived type I/O. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Overview The obvious mechanism for supporting formatted derived type I/O is to allow the user to provide their owning formatting routines. These routines would be called by the I/O runtime library, when a derived-type item is encountered in the I/O list during formatted I/O. Assumptions and comments (these are new): - Internal I/O would be allowed in the user written formatting routines. This is an extension to the standard, which prohibits recursive I/O operations now. - Baker Kearfott pointed out a unique requirement for interval arithmetic I/O, namely the ability to tell the I/O library which way to round "interval"s (upper and lower bounds) when doing I/O, and a requirement on the library to get it "right". I think this requirement can best be dealt with separately, and in a general manner (i.e. add edit descriptors (similar to BN/BZ) which control the rounding mode). The following questions come to mind: - Edit descriptors: + should we provide new edit descriptors for this capability (i.e. FORMAT(UA12.2)), + allow some existing edit descriptors to be used ("G" comes to mind) ? + both + undecided - should the usual "w.d.m" syntax be allowed to specify the field width, etc., and "w" be required to be the actual field width ? + yes + no + undecided - should the user supplied formatting routine be allowed to cause a record to be written/read ? + yes + no + undecided I think not, if the user needs to use multiple records for I/O on a single derived type item, then they should be required to provide their own routines to do it. Extending this proposal to support record processing in the user's I/O routine complicates the interfaces and increases the implementation cost. - how do we handle error conditions ? especially in the user's formatting routine. - how does the I/O library actually interface with the user's formatting routines ? + should just one routine name be "called" from the I/O library, thereby requiring the user to overload these routines, based on the "type" of the list item ? (richard maine's preference) (does this interfere with sequenced derived types or different modules overloading the same routine names for different derived types? (i don't think so)) + or should the user be required to supply a uniquely named set of routines for each derived type (on which I/O is performed) ? (a derived type defined in a MODULE should probably have the MODULE name included in the routine names somehow) + undecided - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - One possible approach: The following are my current thoughts on all this. It is provided to suggest one possible compromise between functionality, ease of use, and implementation cost; and because I like getting beat up :). It may also clarify some points. THIS IS NOT a proposal yet. We need to decide on the funtionality required before we decide anything at this level of detail. - Allow the A, E, F, G, I, and L edit descriptors to be used with a derived type value. - Also allow the "UA", "UB", ..., thru "UZ" edit descriptors. These edit descriptors would only be valid for derived type items. We could also allow this edit descriptors to be used for intrinsic datatypes, calling the user provided formatting routine for that intrinsic datatype. - Pass the edit descriptor specified in the format to the user's format- ting routine, as a CHARACTER (LEN=2 ?) argument. - Require the user to provide TWO routines for any derived type used in an I/O list, one for reading, one for writing. Actually, only the routine matching the I/O statement actually need exist. - The interface for the user provided formatting routines is fixed, as below. The names of these routines are pre-determined, based on the "real" name of the derived type (before USE renaming occurs, similar to namelist variable names today). ALTERNATIVE, overload just TWO function names, one for READs, another for WRITEs. The "write" routine interface will be INTERFACE INTEGER FUNCTION formatted_write_xxx (edit_desc,w,d,m,buffer,dt) ! where "xxx" is the name of ! the derived type. INTEGER, INTENT (INOUT) :: w INTEGER, INTENT (IN) :: d, m CHARACTER*2 edit_desc CHARACTER (LEN=132), INTENT(OUT):: buffer ! could be LEN=w, but we may want ! to support "w=0" for minimal ! field width editing. is 132 enough? TYPE (xxx), INTENT(IN) :: dt END FUNCTION END INTERFACE The routines will return a "status" code, zero for success, non-zero for failure ? possibly required to be positive ? should probably imitate IOSTAT ? If the routine is successful, the argument "buffer" will have the character string to be copied into the output buffer at the current position. The user's routine is responsible for storing the desired characters into buffer. The format specification MUST supply a "w". It may be "zero", in which case the user's routine should produce the "shortest" practical character string, and return the length of the string in "w". "w" can be defined by the user's routine when "w" was positive on entry, but any value stored shall be ignored by the I/O library. "d" and "m" will be zero if they were not present in the format specification. The user's routine should fill "buffer" with "*"s to indicate an insufficient field width, except when "w" was zero. The dummy arg "edit_desc" will have the edit descriptor from the format specification, left justified, blank padded on the right (if needed). The "read" routine interface will be INTERFACE INTEGER FUNCTION formatted_read_xxx (edit_desc,w,d,m,buffer,dt) ! where "xxx" is the name of the derived type. INTEGER, INTENT (INOUT) :: w INTEGER, INTENT (IN) :: d, m CHARACTER*2 edit_desc CHARACTER (LEN=132), INTENT(IN):: buffer ! could be LEN=w, but may want to ! support "w=0" for minimal field width ! editing. is 132 enough? TYPE (xxx), INTENT(OUT) :: dt END FUNCTION END INTERFACE The read routine is similar to the write routine, with these exceptions. - "w" cannot be zero. This is a restriction on the format specification. See later notes about list directed support. - The buffer will contain the remainder of the characters from the current input record (up to a maximum of ?). Buffer is intent (in). - The variable "dt" is defined, by the user's routine, with the desired value. With the above proposal, I believe implementations will need the follow- ing sorts of changes: - when passing a derived type item to an I/O routine, the compiler will also have to pass the address of the appropriate user written I/O formatting routine. The external name for this puppy should be easily derivable by the compiler at this point. - The runtime library I/O routine for derived types will swallow up the next edit descriptor, and pass the relevent info to the user written formatting routine. For input, this includes passing the I/O buffer to the formatting routine, and updating the current position in the buffer afterwards. For output, when the user routine returns, the returned buffer should be copied into the I/O libraries output buffer. - I have not clearly thought out the impact of non-advancing I/O on all this, but it should be manageable. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Miscellaneous Questions (which don't need to be resolved immediately) : - should list directed I/O be supported, yes ? no ? undecided ? The same user routines could be used, with w=0 or -1 to indicate list directed I/O. If you think minimal field editing and list directed I/O are equivalent, "0" is a good choice. Richard Maine suggested using a different edit descriptor string to indicate list directed I/O, such as "* ". There are still some questions to be answered about expected behavior for the user's routine when doing list directed I/O. Multiple record stuff may be a problem. List directed I/O often introduces "extraneous" record boundaries. Note that there is a pending interpretation concerning mixed list directed and FORMAT type I/O, when non-advancing is used. - what about namelist I/O. I'm think we should define namelist I/O for derived types in the standard, independent of this proposal. We could do the same for list directed also. - Richard Maine commented: I don't like hard-wired character lengths in interface definitions. How about len=*? This issue here is how long is the buffer? We should probably require some minimum length, but allow implementations to use larger buffers. What we don't want to do is to require an implementation to support infinitely long formatted I/O records. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Rich Bleikamp bleikamp@convex.com or bleikamp@rsn.hp.com Hewlett Packard Company (Convex Technology Center, Richardson TX)