To: J3 J3/24-102 From: Thomas Koenig Subject: DIN proposal for UNSIGNED type Date: 2024-January-13 # 1. Introduction Unsigned integers are a basic data type used in many programming languages, like C. Arithmetic on them is typically performed modulo 2^n for a datatype with n bits. They are useful for a range of applications, including, but not limited to - hashing - cryptography (including multi-precision arithmetic) - image processing - binary file I/O - interfacing to the operating system - signal processing - data compression Introduction of unsigned integers should not repeat the mistakes of languages like C, and syntax and functionality should be familiar to people who today use unsigned types in other programming languages. # 2. C interoperability One major use case is C interoperability, including interfacing to operating system calls specified in C. Currently, Fortran uses signed int for interoperability with C unsigned int types, which has two drawbacks: ## 2.1 Value range limitation An unsigned int with n bits has a value range between 0 and 2^n-1, while Fortran model numbers have values between -2^(n-1)+1 and 2^(n-1)-1. While agreement of representation between nonzero interoperable Fortran integers and nonnegative unsigned ints on a companion processor is assured by the C standard, this is not the case for unsigned ints larger than 2^(n-1)-1. ## 2.2 Automatically generated C headers It is straightforward to generate C prototypes or declarations suitable for inclusion in the companion processor from Fortran interfaces. At least one compiler, gfortran, has an [option to do this] (https://gcc.gnu.org/onlinedocs/gfortran/Interoperability-Options.html). This fails in the case where the C code specifies unsigned, and Fortran can only specify interoperable signed integers. # 3. Avoiding traps and pitfalls There are numerous well-known traps and pitfalls in the way that C implements unsigned integers. These are mostly the result of C's integer promotion rules, which need to be avoided. Specifically, comparison of signed vs. unsigned values can lead to confusion, which can lead to hard-to-detect errors in the code, infinite loops, and similar. # 4. Prior art At least one Fortran compiler, Sun Fortran, introduced unsigned ints. Documentation can be found at [Oracle] (https://docs.oracle.com/cd/E19205-01/819-5263/aevnb/index.html). This proposal borrows heavily from that prior art, without sticking to it in all details. The discussion at the [Fortran proposals site](https://github.com/j3-fortran/fortran_proposals/issues/2) also influenced this proposal. # 5. Proposal ## 5.1 General - A type name tentatively called UNSIGNED, with the same KIND mechanism as for INTEGER, plus a SELECTED_UNSIGNED_KIND FUNCTIONS, to implement unsigned integers. - Unsigned integer literal constants are marked with an U suffix, with an optional KIND number attached via the usual underscore. - A conversion function UINT, with an optional KIND - Binary operations between INTEGER and UNSIGNED are prohibited without explicit conversion, binary operations between UNSIGNED and REAL are permitted - Unsigned integers should be permitted in a SELECT CASE - Unsigned integers should not be permitted as index variables in a DO statement or array indices - Unsigned integers can be be read or written in list-directed, namelist or unformatted I/O, and by using the usual edit descriptors such as I,B,O and Z - Extension of ISO_C_BINDING with KIND numbers like C_UINT, C_UINT8_T etc. - Likewise, ISO_Fortran_binding.h should be suitably extended. - ISO_FORTRAN_ENV should be extended with KIND PARAMETERs like UINT8, UINT16 etc. - Behavior on conversion to an integer value outside the range of the integer should be processor-dependent. For interoperable KINDs, the behavior should be identical to that of the companion C processor. ## 5.2 Behavior on overflow In the discussion on github, two possible behaviors on overflow were discussed: That this should be forbidden (using a "shall" directive) and that this should wrap around. The author of this proposal is of the opinion that wrap-around semantics (modulo 2^n for an n-bit type) should be specified, for several reasons: - It is required for several applications which would otherwise be left to C, such as cryptography, hashes and big-integer arithmetic - The standard does not (up to now) mandate run-time checks, and an implementation which does not perform overflow checks would perform the same operation as with modulo 2^n arithmetic - Writing checks for overflow with user code is relatively straightforward. For example, ``` c = a + b if (c < a) then ! overflow has occurred ``` but only possible if the operation to be checked is not, in fact, illegal. Over time, compilers will tend to remove such checks because they cannot be true because of the language definition (compare the removal of NULL pointer checks in C). # 6. Relation to other proposals This proposal complements the BITS proposal, J3/07-007r2.pdf, as proposed in J3/22-195.txt. BITS restricts its operations to logical operations and comparisons on bit lengths, whereas this proposal is for values requiring arithmetic operations, and is less flexible in bit length.