J3/97-153 Date: 15 Apr 97 To: J3 From: Richard Maine Subject: Proposed Specs and Syntax for System Arguments References: J3/97-154 This paper proposes specifications and syntax for system arguments. A draft was circulated earlier on comp.lang.fortran and on the x3j3 mailing list. I have made several revisions based on the comments received about that draft. The main revisions were: added discussions about C interop; increased attention to Posix in several areas; and a few name changes (and spell-checked it). I avoid the prejudicial use of the term "command line"; the term "system argument" seems more neutral. It is left up to the system to determine what constitutes a system argument. (With one minor exception; we specify that the command name, if any, is effectively the 0'th argument. If something on this isn't specified, code portability could be hindered by some systems making it the first argument and some systems not.) The specs and syntax are simple enough that it seems worthwhile to address them in one paper. ARGC/ARGV The C interopability TR (J3/97-154) includes the ARGC/ARGV global variables for access to system argument information. This could be considered to at least minimally meet the requirement for access to system arguments. My first draft of this proposal did not adequately consider this question. After some study of the C interopability TR, I conclude that something additional is needed to provide a facility that is simple to use in a pure Fortran environment. The ARGC/ARGV definition is driven by C interopability issues. Whereas I agree that C interopability is an issue of great importance, the requirement for system arguments exists even for purely Fortran programs that would not otherwise use the C interopability features. Such a user would likely find it strange to have to use the C interopability features. Although the C interopability version is appropriately designed to fit naturally with existing C interfaces that expect ARGC/ARGV, it is fairly complicated to use for simple purely Fortran things. I argue, therefore, that we need two interfaces to the system arguments - the C interopability one plus the one proposed in this paper. Admittedly, it would not be horribly difficult for the user to write a short subroutine to provide a simpler interface to the information in ARGC/ARGV. But it seems worthwhile to standardize this instead of forcing each user to write his/her own. I propose that both interfaces be made available so that a program could use either or even both in the same program. The following simple example serves to illustrate my point that the ARGC/ARGV interface is complicated to use for simple needs. Posit a program that wants exactly 2 system arguments, perhaps input and output file names. Assume the arguments are just character-valued for simplicity. The following is my best quick attempt to do this using the C interopability ARGC/ARGV. (I think it's close, but I might have missed something exactly because it is fairly complicated, bringing in a lot of stuff that seems peripheral to what I'm trying to do). use iso_c !-- of course character(128) :: input_name, output_name !-- for the final results. type(c_char_ptr) :: arg_ptr if (argc /= 2) call oops("Wrong number of args.") arg_ptr = c_dereference(argv, arg_ptr) input_name = c_dereference(arg_ptr) !-- I'm assuming that char(kind=c_char) is the same !-- as default character kind; otherwise I'd give up. argv = c_increment(argv) arg_ptr = c_dereference(argv, arg_ptr) output_name = c_dereference(arg_ptr) This ought to work (I think), but I certainly find it hard to follow (and I had to carefully study the C interop paper while writing this code). POSIX 1003.1 and 1003.9 I've never actually seen a copy of Posix 1003.1, so I hesitate to say much specifically about it. I have at one point seen at least drafts of Posix 1003.9, but I don't seem to have saved them anywhere that I can still find. In my judgment, Posix 1003.9 is a failed standard (and thankfully so). It never received ISO approval and appears to have no active development and little user support. There are certainly some users, but I feel safe in characterizing the user base as small. I do not, therefore, feel constrained to adopt the Posix 1003.9 binding, although we should certainly use it as a source of ideas. There are Fortran 90 features that allow us to make significant improvements over the Posix 1003.9 bindings. Specifically, optional arguments can find good application here, and modules allow us to choose more legible names without unacceptably polluting the global namespace. I have, however, used Posix 1003.9 as a guideline by adopting its approach on issues where there was no substantiative reason for change. This draft reflects several small changes to follow that philosophy. (One example is in the treatment of invalid values for the argument number). It seems reasonable to express this proposal as a binding to the appropriate functionality in Posix 1003.1. If someone gets a copy of Posix 1003.1, then we can look at that. However, I'm not going to propose that we list something as a normative reference unless I actually read the relevant portions that we are referencing. I also would not want to restrict this proposal to Posix systems; there exist several systems of significant interest that make no claim to Posix conformance. I'd therefore suggest that if we do reference Posix 1003.1, it be done with along the general lines of saying that the intrinsics (see below) are bindings to the appropriate Posix 1003.1 definitions of system arguments on Posix systems, but can also be implemented on non-posix systems, in which case the definitions of the system arguments are system-dependent. IARGC/GETARG The approach of this proposal is modeled loosely after iargc/getarg. In fact, we could do worse than just adopt iargc/getarg as a standardization of existing practice (for instance, I think Craig's proposal is worse). I'd probably vote for a proposal to standardize iargc/getarg if the proposal presented here isn't acceptable, but I think this is an improvement. One potential problem with just standardizing iargc/getarg is that existing implementations are not completely consistent with each other. There are, for instance, variations in whether argument numbering starts at 0 or 1 and whether or not the command name is counted. Thus, by standardizing one specification for iargc/getarg, we might invalidate some existing compilers and programs. Similarly, if we standardize iargc/getarg, we might be more constrained against adding enhancements in the future. Besides which, iargc is a pretty poor name choice, though if that were the only problem, I'd probably agree that existing practice would suggest using the name anyway. This paper proposes that we specifically avoid the names iargc and getarg so that a compiler could easily support both the (proposed) standard forms and any existing vendor variations of iargc/getarg for compatibility. NAMESPACE IMPACT This part of the proposal is trivially separable from the rest. Instead of continually adding new intrinsics to the global namespace, where they potentially conflict with user-written procedures, I propose that we follow the lead of HPF, the IEEE exceptions TR, and the ISO_VARYING_STRING module in using modules to control the namespace pollution. We might even go so far as to define module names beginning with ISO_ to be reserved. This is itself a namespace pollution, but a much more limited one than results from adding new intrinsics to the global namespace. For one thing, it would be unreasonable to require all new intrinsics to have names starting with ISO_, but that might be ok for modules. Thus, I propose that the intrinsics below be packaged in an intrinsic module provisionally named ISO_SYSTEM. (All the usual rules for USE statements apply). The proposed name is intended to provide an umbrella for other general system interface intrinsics, should there be any. One might imagine that date_and_time and system_clock might be in the same module if they weren't already global intrinsics. The name ISO_SYSTEM is very provisional; alternatives invited. We can also discuss whether the name should be fairly general (like ISO_SYSTEM) to encourage future additions to the same intrinsic module, or more specific to the system argument functionality. SYSTEM_ARGUMENT_COUNT INTRINSIC Following the style of iargc/getarg, a separate intrinsic function is proposed to return the number of system arguments available. System_argument_count is an intrinsic function. It has no arguments. It returns a scalar default integer that specifies the number of system arguments available. It returns 0 if there are no system arguments available. It returns 0 if the system does not support system arguments. On a system that has a concept of a command name, the command name itself does not count as one of the system arguments. As proposed, this intrinsic function is identical to IPFXARGC from Posix 1003.9, except of course, for the name change and the semantics of being an intrinsic module procedure. The following are some possible variations that were considered, but my recommendation is to stick with the specification above. It seems to "work" best anyway, and even more so when I factor in a bias in favor of following Posix 1003.9 unless there are good reasons otherwise. I consider the name change to be a "good reason otherwise", but the following alternatives don't qualify. 1. We could specify that the function returns some negative value if the system does not support system arguments, thus allowing the user to distinguish between the case where the system doesn't support system arguments and the case where they are supported but there are none. I'm not sure how useful it is to make this distinction and it would mean that user codes would have to specially check for a negative value before using it in things like array dimensions. 2. We could make this a subroutine, in which case a separate argument could return an error indication. I can see arguments on both sides of this question, leading me to give the "deciding vote" to following Posix 1003.9. 3. We could provide access to the system argument count by adding an optional argument to the getarg equivalent, but there is no "nice" place to put it. If the argument count is the first argument, it interferes with simple positional argument usage when retrieving the argument values. If it is the last argument, then potential future extensions could make it strangely placed in the middle of arguments that relate to specific system arguments. Besides which, none of the existing practice (Posix 1003.9 and iargc/getarg) supports this. 4. We could omit this function and require people to access the C interop argc variable. That variable, unlike argv, is pretty simple to use. The only complication is in carefully handling its possibly non-default kind. But I think it inconsistent to provide only half of an interface here. Plus, we have again the precedents from Posix 1003.9 and iargc. GET_SYSTEM_ARGUMENT INTRINSIC The get_system_argument intrinsic is modeled after getarg, with some enhancements. We provide additional optional arguments to support meaningful trailing blanks and to support argument names. The intrinsic is, in principle, extensible to return other system argument properties as extra optional arguments. (Possible examples could include argument hierarchy information, argument type, argument presence, and others). This proposal is also similar to PXFGETARG. The differences are the name change, the optionality of several arguments, and the possible addition of a name argument. The use of optional arguments allows us to combine the simplicity of getarg with the extra functionality of PXFGETARG. get_system_argument is an intrinsic subroutine with the following arguments (in the following order). 1. NUMBER. A scalar default integer intent(in) argument. This is the only argument that is required. It specifies the number of the system argument that the remaining intent(out) arguments give information about. Useful number values are those between 0 and the argument count returned by the system_argument_count intrinsic. Other values are allowed, but will result in error status return (see below). System argument number 0 is a special case, discussed below (and is not included in system_argument_count). The remaining system arguments are numbered numbered consecutively from 1 to the argument count in an order determined by the processor. 2. VALUE. An optional scalar default character intent(out) argument of assumed length. It returns the value of the system argument. If the system argument value cannot be determined, it returns blanks. 3. LENGTH. An optional default integer intent(out) argument. It returns the significant length of the system argument, possibly including trailing blanks. This length does not consider any possible truncation in assigning the system argument value to the VALUE argument; in fact the VALUE actual argument need not even necessarily be present. If the system argument length cannot be determined, a length of 0 is returned. 4. STATUS. An optional default integer intent(out) argument. It returns 0 if the argument retrieval was successful. It returns system-dependent non-zero values if the argument retrieval failed. One reason for failure is a value of number that is negative or greater than system_argument_count(). 5. NAME. An optional default character intent(out) argument of assumed length. It returns the name of the system argument. If the name of the system argument cannot be determined, it returns blanks. Note: we could possibly omit this one. Lots of systems don't have argument names. But some do. Posix doesn't. This one should probably be straw-voted. System argument 0 is always defined to be the system "command name" for the program on systems that have such a concept. It is always allowed to call the get_system_argument intrinsic for system argument number 0, even if the system does not define command names or other system arguments. In such a case, the argument VALUE returns blanks and the argument LENGTH returns 0. Note: as defined here, get_system_argument could conceivably be elemental. That might possibly close some future enhancement possibilities, though. Any opinions on the advisability of making it so? GET_SYSTEM_VARIABLE INTRINSIC The get_system_variable intrinsic is modeled loosely after getenv. The term "variable" seems more generic and appropriate than "environment", but I'd accept the name get_system_environment, or even get_environment if the majority preferred those. I'll note that the number of people who can't spell environment is substantial enough to be a non-trivial issue for those name choices. ("Enviorment" seems to be the spelling choice of many, to judge from Usenet postings - perhaps a biased sample). I recommend avoiding the name getenv because of the potential for conflict with existing and possibly incompatible intrinsics. (I'm not entirely sure that there aren't existing versions of getenv both as a subroutine and as a function, for example). The name "get_env" is too close; it would invite confusion. This proposed intrinsic is also similar to PXFGETENV, with a name change, argument optionality, and the replacement of LENNAME by TRIM_NAME. Note that a system is not required to support any system variables. But a program is always allowed to call get_system_variable anyway, even if no system variables can be successfully retrieved. Get_system_variable is an intrinsic subroutine with the following arguments. 1. NAME. A scalar default character assumed size intent(in) argument. It is required. Its value (possibly after removal of trailing blanks) is the name of the system variable to be referenced. The set of allowed variable names is processor dependent. It is also processor dependent whether system variable names are case sensitive or not. (Ugly, but I don't see that we really have a choice here - just like file names). 2. VALUE. An optional scalar default character assumed size intent(out) argument. It returns the value of the system variable. If there is no such system variable, or if the value cannot be determined, it returns blanks. 3. LENGTH. An optional default integer intent(out) argument. It returns the significant length of the system variable, possibly including trailing blanks. This length does not consider any possible truncation in assigning the system variable value to the VALUE argument; in fact the VALUE actual argument need not even necessarily be present. If the system variable length cannot be determined, a length of 0 is returned. 4. STATUS. An optional default integer intent(out) argument. It returns 0 if the variable retrieval was successful. It returns system-dependent non-zero values if the variable retrieval failed. The most likely error is that there is no system variable of the specified name (or that the system doesn't support any system variables). 5. TRIM_NAME. An optional logical intent(in) argument. If this argument is omitted or has the value .true., then any trailing blanks in the NAME argument are trimmed before interpreting it as a system variable name. If this argument has the value .false., then trailing blanks in the NAME argument may be significant. Note that a system is free to disallow names with trailing blanks or to consider them as equivalent to the same name without trailing blanks. That is, the system may effectively ignore a .false. value of TRIM_NAME. In contrast, a .true. value of TRIM_NAME is always honored. Left entirely to my own designs, I'd probably have required blank trimming for all variable names. But Posix does allow and distinguish variable names with trailing blanks (yuck), so I added this argument for full support of that capability. I'm open to removing it though. Note that PXFGETENV uses an integer lenname argument for this purpose, with the value 0 being a special case flag meaning to trim all trailing blanks. I don't think that makes much sense for intent(in) arguments. It completely ignores the actual length of the passed-in string. It seems to me that you either want trailing blanks trimmed or not (a yes/no choice obvious for a logical). If you want some substring other than the result of trimming, then just use substring notation to pass the desired substring. Note that there is a distinction between intent(in) and intent(out) strings in this regard. Intent(out) strings have no inherent way to return length information, so they do need a separate integer argument. On the other hand, I expect almost nobody to use this argument, so I'm open to the suggestion of making it an integer length as in PXFGETENV (with 0 and omitted both meaning to trim) if that is preferred by a majority. I do feel fairly strongly, however, that it should be the last positional argument instead of the second as in PXFGETENV. Almost all applications will want to omit it, and making it last makes that simpler. Note: Get_system_variable could also conceivably be elemental. I wouldn't expect its elemental properties to be much used, though. Opinions? UNPARSED COMMAND LINE I debated providing an option to retrieve an "unparsed command line", but decided that it was likely to inhibit, rather than help, portability. If we provide two different ways of getting system argument information (individual arguments or unparsed) and we don't mandate support of one of them, then some systems might support only one form while some systems support only the other. This would mean that code with any pretext of portability would have to try both methods and be prepared to deal with either one. I don't want to have to do this in my own code, and neither would I expect other users to want it. This would seem like abdicating our responsibility to define one standard. Even if the form standardized isn't the most convenient for some application, it is bound to be easier for that application to deal with the form conversion than for it to have to deal with not knowing which form will be supported. I've already strayed from the ideal of standardizing one interface by proposing that both the C interop stuff and this proposal be standardized, thus giving two methods. I believe I've shown adequate justification for that because of the significantly different requirements of accommodating existing C routines plus native Fortran usage. Converting to and from the C interop form is pretty "messy". I don't see similarly strong justification for adding a third interface to the same basic information. Since some systems make getting at the unparsed command line awkward, and some GUI systems might not even define such a concept (whereas they might well interpret a drag-and-drop operation as a system argument in some sense), it seems safest to choose the individual argument approach. A command-line based system could choose to call the entire command tail a single argument if no more meaningful division was readily definable. However, if a majority disagrees with me on this point, I'd propose adding a separate subroutine to retrieve the unparsed command line. I'd propose making it as simple as seems practical. That would probably mean 3 arguments: a string for the data, an integer for the length, and an integer status. I'd prefer to avoid complications like distinguishing between the command name and the "tail". CODE SAMPLES use iso_system character*128 :: arg, home_dir, prog_name integer :: i call get_system_variable('HOME', home_dir) call get_system_argument(0, prog_name) arg_loop: do i = 1 , system_argument_count() call get_system_argument(i, arg) !-- Process this argument. end do arg_loop ---------------------------------------------------------------