To: J3 J3/24-108 From: Gary Klimowicz and JoR Subject: Preprocessor directives seen in existing Fortran programs Date: 2024-February-22 Reference: 23-192r1 Overview ======== To help define requirements for a Fortran-standard preprocessor, we examined the use of the C preprocessor in a collection of just under 1900 Fortran projects. These projects were collected from the projects page, and Beliavsky's plus additional codes on SourceForge or downloaded from release tar-files. The latest version of the accumulated list of Fortran examples can be found at . The files used to analyze these projects can be found at . The 'stats.csv' file in that project contains the raw data. (The Excel file that contains all the data and statistics is too large (187 MB) to place in GitHub. It's unwieldy to work with as well: recalculations and file saves can take minutes.) The statistics are not exact, but approximate. Caveats about the data analysis: - The analysis tools do not (yet) use a parser to examine the directives. - The examination is purely textual. - We try to be careful about how we look at the text, though, not looking inside comments, character constants, or Hollerith strings. - The tools do not (yet) look inside #included files. - The tools do not (yet) look inside files included via INCLUDE lines. This paper merely documents the data found so far. The projects contained - 300,000 Fortran files - Over 153+ million lines of Fortran - 189,000+ fixed-form Fortran files - 146,000+ free-form Fortran file - 86,000+ files with C-style preprocessor directives - 1.2 million C-style preprocessor directives - 48+ million lines of code in the projects that use cpp directives In the files that contain C preprocessor directives, about 2.5% of the lines are directives. (36 files contain over 1,000 directives each.) #include ======== '#include' is the most frequently used directive, appearing over 373,000 times (30.4% of all directives). Oddly enough, there are some 1600+ files that use both '#include' and Fortran INCLUDE lines. About 2% of the directives are '#define' with a value, and 0.38% of directives use '#define' with arguments. #define ======= Nothing unusual here. There are no cases where '#define' is used with a variable number of arguments '(...)' or '(arg1, arg2, ...)'. #if, #ifdef, #ifndef, #elif, #else, #endif ========================================== '#ifdef' dominates the if-test directives (about 10 times as many as '#ifndef'). If we're being completely candid, the most-used directive is really '#endif'... Comments in directives ====================== Most of the comments in the directives use the C-style /* ... */ form. There do not appear to be occurrences of /*-style comments in directives that span multiple lines. About 50 files use '//' style comments. Continuations in directives =========================== Relatively few directives are continued with backslashes (about 3,500 lines of 1.2 million directives). #error and #warning =================== About 120 files contain '#error'. About 20 files contain '#warning'. #pragma ======= 8 files contain '#pragma push_macro' and '#pragma pop_macro'. These are not defined in the C standard. #line and # ======================== About 50 files appear to have been preprocessed already, and contain either '#line' directives, or '# ' directives. Hollerith and free-form '&' continuation ======================================== There is only one file that appeared to embed an '&' at the end of a Hollerith string. It was is really a continuation, with the Hollerith text continued on the next line. The question this answered is "Which comes first? Hollerith recognition or continuation processing?" Continuation processing. This means that a phase model for preprocessing might apply, as it does in the C standard. Unrecognized directives ======================= About 1,100 files contain directives we couldn't recognize. Some appear to be comments; some are incorrect like '#end'. Some may be bugs in the analysis tools. Fortran operators ================= There is one file (MOPAC@openmopac/src/deprecated/mod_calls_cublas.F90) that uses '.AND' in a '#if' directive. +---- | #if (MAGMA .and. LINUX) +---- This is may be a mistake. Otherwise, there are no examples of Fortran '.conditional.' operators in directives. Further work ============ Examine included files ~~~~~~~~~~~~~~~~~~~~~~ We currently examine only files that have suffixes that lead us to conclude they are Fortran files ('.f', '.F', '.f90', etc.). We should also examine all files that appear in '#include' directives and 'INCLUDE' lines. Case sensitivity ~~~~~~~~~~~~~~~~ To understand better whether the preprocessor should be case-sensitive (as the C preprocessor is) or case-insensitive (as the rest of Fortran is), it would be good to know if it matters in the list of sample projects. Fortran users think it does. Data would be good to have. Unrecognized directives ~~~~~~~~~~~~~~~~~~~~~~~ There are a number of files with directives that don't match the forms defined in the C standard. What is the new '#embed' directive in C? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The recent draft C standard includes a new directive, '#embed'. I haven't looked into it, or what other changes there might be in the upcoming C standard for the preprocessor. Complete summary of statistics seen =================================== Description Count Percentage ------------------------------------------------ ----------- ---------- Total number of Fortran lines 154,908,009 Percent of total lines that are directives 1,228,773 0.79% Directives containing #include 373,588 30.4% Directives containing #define 24,367 1.98% Directives containing #define with arguments 4,660 0.38% Directives containing #undef 15,564 1.27% Directives containing #ifdef 205,926 16.76% Directives containing #ifndef 20,955 1.71% Directives containing #if 132,183 10.76% Directives containing #elif 17,373 1.41% Directives containing #else 70,819 5.76% Directives containing #endif 358,990 29.22% Directives containing #pragma 16 0.% Directives containing #line 104 0.01% Directives containing # nnn 1,656 0.13% Directives containing #error 232 0.02% Directives containing newline 48 0.% Directives with nothing following the # 112 0.01% Directives containing unrecognized commands 2,180 0.18% Directives containing #define with varargs (...) 0.% Directives containing \ continuations 3,462 0.28% Directives containing # operator 48 0.% Directives containing ## operator 85 0.01% Directives using Fortran operators (.EQ., ...) 1 0.% Directives where the '#' is indented 623 0.05% Directives containing /* ... */ comments 19,053 1.55% Directives containing /* ... comments 0.% Directives containing C-style // ... comments 217 0.02% Instances of Fortran INCLUDE lines 210,049 17.09% Instances of Hollerith constants 38,451 3.13% Instances of Hollerith constants ending in '&' 1 0.% Directives containing #if ...! not operator 18,031 1.47%