To: J3 J3/24-109 From: Gary Klimowicz and JoR Subject: On Fortran awareness in a Fortran preprocessor Date: 2024-February-22 Reference: 23-192r1 Priorities ========== JoR has been operating with the following priorities in mind for the preprocessor: - Define the minimum viable product. Full compatibility with the C preprocessor is not required. - Try not to break existing use of C preprocessor usage in existing Fortran codes. - Define preprocessor behavior that is more compatible with the Fortran language than the C preprocessor can be. To that end, we will elaborate on some of the choices JoR made in its recommendations. Fundamental features ==================== `__LINE__' and `__FILE__' ~~~~~~~~~~~~~~~~~~~~~~~~~ These are useful for producing error messages in Fortran programs, even in the absence of preprocessor directives. `#line' ~~~~~~~ Fortran program files are sometimes created with external tools (such as `fypp' or even `cpp') that need to pass information about the true origins of the source lines (as opposed to some temporary file created in the process). The Fortran processor and preprocessor should be able to know the true provenance of the lines in the Fortran program. `#ifdef', `#ifndef' ~~~~~~~~~~~~~~~~~~~ These are two of the most frequently-used directives in existing programs (especially `#ifdef'). `#define' and `#undef' ~~~~~~~~~~~~~~~~~~~~~~ Nothing to say except that they are used extensively in existing programs. It does not appear necessary to support varargs-style macro definitions '`#define FOO(a, b, ...)', as they haven't appeared in the existing body of code that JoR has assembled. `#if', `#elif', `#else' ~~~~~~~~~~~~~~~~~~~~~~~ Also frequently used, especially with in '`#if defined(FOO)''. `#include' ~~~~~~~~~~ The most frequently used directive in existing codes. Sometimes used in files that also use `INCLUDE' lines. `#error', and maybe `#warning' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `#error' appears in some existing codes. It is used to pass compile-time error information to the programmer. `##' and `#' operators ~~~~~~~~~~~~~~~~~~~~~~ These or used in some existing Fortran codes, and are useful for constructing names and strings on the fly in the preprocessor. Case-sensitivity ================ The C preprocessor (and so, presumably, its use in Fortran programs) is sensitive to the case of identifiers. '`#define FOO 1'' and '`#define foo 1'' define two different macros. JoR recommending adopting case-sensitivity in the preprocessor is to avoid potential name clashes in existing programs. (We do not yet know the real impact of this, and would like to investigate it further: How often does in matter?) Adopting case sensitivity in the Fortran preprocessor could be problematic if identifiers are used in a case-insensitive way: ,---- | #define FOO bar | program test | implicit none | integer, parameter :: FOO = 1 | | print *, foo ! Error; `foo` is not defined. FOO is. | end `---- Token replacement in character literals ======================================= The C preprocessors, in general, do not look inside strings for token replacement. Although Hollerith data for `DATA' values and in `FORMAT' edit descriptors was deleted from Fortran, it still appears in existing programs. If a processor supports Hollerith data, its preprocessor should not make token replacements in that data. `//' is it a comment indicator or a concatenation operator ========================================================== Only a small number of the examined programs use `//' to introduce comments on directive lines. We should probably only support `/* ... */' comments in the Fortran preprocessor. That would leave `//' as the Fortran concatenation operator. Do `INCLUDE' files get preprocessed (as if invoked with `#include') =================================================================== Why JoR says yes: - To simplify the semantic description of the file inclusion mechanisms. - If we don't what happens to preprocessor directives placed in the files brought in with `INCLUDE'? When would they get processed? The only correct answer seems to be "at preprocessing time." What about directives in comments? ================================== Some programs appear to expect that preprocessor tokens will be expanded in comment-style directives. There are a couple issues that come up: - How do you know a comment is a directive? We probably can't. - What if we expand tokens in all comments? That might be OK. If it's a directive, that would be fine. If it's not a directive, then we're changing text that the rest of the Fortran processor is going to ignore anyway. Is `!' an operator for `#if', `#elif'? Or an indicator for a comment? ===================================================================== There are many instances of `!' being used as an operator in existing Fortran programs. The most obvious are in expressions like '`#if !defined(FOO)''. Perhaps there are places where we could treat '`!'' as a comment character (such '`#endif''), but it's probably safest just to not allow it in preprocessor directives. Fortran awareness ================= Expansion in fixed-form Fortran ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixed-form provides a few challenges. - The preprocessor should not expand tokens that start with '`C'' in column one. (You don't want to turn a comment into something else). - The preprocessor should not expand a token that looks like it starts in column six. - The preprocessor should not expand treat a '`#'' in column six as a directive. Existing codes have '`#'' in column six for continuations. - The preprocessor should not expand letters in a 'letter-spec-list' in IMPLICIT statements. - The preprocessor should expand the identifiers following a constant and '`_'' for KIND names (e.g., '`1_MYREAL''). - The length of a fixed-form line can change depending on the lengths of expanded tokens. - Identifiers in Fortran are allowed to be nearly the same length (63 characters) as the text on a fixed-form line (columns 7-72). Identifiers need to be recognized across continuations. The simplest approach is probably to conjoin the text of the fixed-form lines and continuations before expanding tokens in the preprocessor. That is, concatenate the following before macro replacement: - the line number (columns 1-5), the initial line columns 7-72 - columns 7-72 of all subsequent continuations Should the preprocessor need to output the preprocessed text (as if using the common '`-E'' option to `cpp'), the expanded text would be output according to fixed-format rules (an initial line, and proper continuation lines as needed). Spaces in fixed-form source ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Normally, spaces in an identifier are ignored in fixed-form source. To simplify the preprocessor, JoR recommends treating blanks as significant between identifiers in fixed-form.