To: J3 J3/24-116 From: Thomas Koenig and JoR Subject: A modest proposal for adding an UNSIGNED type to Fortran (DIN 6) Date: 2024-February-28 References: 24-102, 07-007 WG5 N2230 DIN Suggestions for F202Y.pdf WG5 N2142 Fortran 2020 Feature Survey Results 201710.pdf # 1. Introduction Unsigned integers are a basic data type used in many programming languages, such as C. Arithmetic on them is typically performed modulo 2^n for a datatype with n bits. They are useful for a range of applications, including, but not limited to - hashing - cryptography (including multi-precision arithmetic) - image processing - signal processing - data compression - binary file I/O - interfacing to C - interfacing to the operating system Unsigned integers were the fourth most requested item to add to Fortran 202x in 2017. It is the sixth item on the DIN national body list for inclusion in Fortran 202y. We propose adding a small set of features for an unsigned data to Fortran 202y. ## 1.1. Prior art At least one Fortran compiler, Sun Fortran, supported unsigned integers. Documentation can be found at [Oracle] (https://docs.oracle.com/cd/E19205-01/819-5263/aevnb/index.html). This proposal borrows heavily from that prior art, without sticking to it in all details. ## 1.2 Inputs to this proposal In addition to the references listed above, the discussion at the Fortran proposals site https://github.com/j3-fortran/fortran_proposals/issues/2 influenced this proposal. # 2. Goal Define a new type, UNSIGNED, with a small set of intrinsic operations and intrinsic functions that would satisfy most of the use cases listed above. ## 2.1 Value range limitation An UNSIGNED with n bits has a value range between 0 and 2^n-1. (Note that Fortran model integers have values between -2^(n-1)+1 and 2^(n-1)-1). ## 2.2 Arithmetic is closed over the UNSIGNED value range All arithmetic operations on UNSIGNED values are closed over 0 to 2^n-1. Arithmetic operations produce results equal to the result of the (mathematical) integers, modulo 2^n. The following intrinsic binary arithmetic operators are extended to support UNSIGNED values: + - * / The unary - operator shall not be applied to. UNSIGNED values. The exponentiation operator ** shall not be applied to UNSIGNED values. ## 2.3 Prohibit mixed-mode arithmetic with INTEGER and REAL The intrinsic Fortran binary arithmetic operators shall have both operands be UNSIGNED if any of the operands is UNSIGNED. The intrinsic Fortran binary relational operators (defined in R1014 rel-op) shall have both operands be UNSIGNED if either of the operands is UNSIGNED. To perform mixed-mode arithmetic with INTEGER or REAL values, the UNSIGNED operand must be converted to an INTEGER or REAL value explicitly via the INT or REAL intrinsic functions. # 3. Avoiding traps and pitfalls There are numerous well-known traps and pitfalls when using unsigned integers. We attempt to avoid these as follows: - comparison of signed vs. unsigned values: require conversion via an intrinsic function or other means. - overflow from assignment of large UNSIGNED values to similar-sized INTEGER entities: Either accept truncation (modulo 2^(n-1)) or specify the KIND with a larger range to the INT intrinsic function. - confusion about modulo arithmetic, especially with respect to subtraction (e.g., 3u - 5u < 3u .EQV. .false.): Add notes to the standard warning about this. # 4. Proposal - A type name tentatively called UNSIGNED, with the same KIND mechanism as for INTEGER, plus a SELECTED_UNSIGNED_KIND function, is added to implement unsigned integers. - Unsigned integer literal constants are marked with a U suffix, with an optional KIND specifier attached via the usual underscore. - Add a conversion function UINT, with an optional KIND. - Prohibit binary operations between INTEGER and UNSIGNED or REAL and UNSIGNED without explicit conversion. - Permit unsigned integer values in a SELECT CASE. - Prohibit unsigned integers as index variables in a DO statement or as array indices. - Allow unsigned integers to be read or written in list-directed, namelist or unformatted I/O, and by using the usual edit descriptors such as I, B, O and Z. - Allow UNSIGNED arguments to some intrinsics: - BGE(UNSIGNED, UNSIGNED) and friends - BIT_SIZE(UNSIGNED) - BTEST(UNSIGNED, INTEGER) - DIGITS(UNSIGNED) - DSHIFTL(UNSIGNED, UNSIGNED, INTEGER) - DSHIFTR(UNSIGNED, UNSIGNED, INTEGER) - HUGE(UNSIGNED) - IAND(UNSIGNED, UNSIGNED), IEOR, IOR, NOT - IBCLR(UNSIGNED, INTEGER), IBITS, IBSET - ISHFT(UNSIGNED, INTEGER, INTEGER) and ISHFTC - LEADZ(UNSIGNED) and TRAILZ - MERGE_BITS(UNSIGNED, UNSIGNED, UNSIGNED - MIN(UNSIGNED, ...) and MAX - MOD(UNSIGNED, UNSIGNED) and MODULO - MVBITS(UNSIGNED, INTEGER, INTEGER, UNSIGNED, INTEGER) - POPCNT(UNSIGNED) and POPPAR - RANGE(UNSIGNED) - SHIFTA(UNSIGNED, INTEGER), SHIFTL, SHIFTR - TRANSFER(UNSIGNED, UNSIGNED, INTEGER) - Allow UNSIGNED arguments to some array intrinsics: - IALL(UNSIGNED array, INTEGER, [, mask]) and friends - IPARITY(UNSIGNED array, INTEGER [, mask]) - CSHIFT(UNSIGNED array, INTEGER, INTEGER) - DOT_PRODUCT(UNSIGNED array, UNSIGNED array) - EOSHIFT(UNSIGNED array, INTEGER, INTEGER) - FINDLOC(UNSIGNED array, UNSIGNED, ...) - MATMUL(UNSIGNED array, UNSIGNED array) - MAXLOC(UNSIGNED array, ...), and MINLOC - MAXVAL(UNSIGNED array, ...), MINVAL - Extend ISO_C_BINDING with KIND numbers, for example, C_UINT, C_UINT8_T. - Extend ISO_C_BINDING with other things I forgot to do. - Extend ISO_Fortran_binding.h appropriately. - Extend ISO_FORTRAN_ENV with KIND PARAMETERs, for example, UINT8, UINT16, UINT32. - Conversion of an UNSIGNED value to an INTEGER outside the range of the integer is processor-dependent. - Conversion of an INTEGER value to an UNSIGNED outside the range of the integer is processor-dependent. - Conversion of an UNSIGNED value to an INTEGER with a wider range is exact. # 5. Relation to other proposals This proposal complements the BITS proposal, J3/07-007r2.pdf, as proposed in J3/22-195.txt. BITS restricts its operations to logical operations and comparisons on bit lengths. This proposal adds arithmetic operations. This proposal limits the bit lengths to common powers of two.