To: J3                                                     J3/24-102
From: Thomas Koenig
Subject: DIN proposal for UNSIGNED type
Date: 2024-January-13

# 1. Introduction

Unsigned integers are a basic data type used in many programming
languages, like C.  Arithmetic on them is typically performed modulo
2^n for a datatype with n bits. They are useful for a range of
applications, including, but not limited to

- hashing
- cryptography (including multi-precision arithmetic)
- image processing
- binary file I/O
- interfacing to the operating system
- signal processing
- data compression

Introduction of unsigned integers should not repeat the mistakes of
languages like C, and syntax and functionality should be familiar to
people who today use unsigned types in other programming languages.

# 2. C interoperability

One major use case is C interoperability, including interfacing to
operating system calls specified in C. Currently, Fortran uses
signed int for interoperability with C unsigned int types, which has
two drawbacks:

## 2.1 Value range limitation

An unsigned int with n bits has a value range between 0 and 2^n-1,
while Fortran model numbers have values between -2^(n-1)+1 and
2^(n-1)-1.  While agreement of representation between nonzero
interoperable Fortran integers and nonnegative unsigned ints on
a companion processor is assured by the C standard, this is not
the case for unsigned ints larger than 2^(n-1)-1.

## 2.2 Automatically generated C headers

It is straightforward to generate C prototypes or declarations
suitable for inclusion in the companion processor from Fortran
interfaces.  At least one compiler, gfortran, has an
[option to do this]
(https://gcc.gnu.org/onlinedocs/gfortran/Interoperability-Options.html).
This fails in the case where the C code specifies unsigned, and
Fortran can only specify interoperable signed integers.

# 3. Avoiding traps and pitfalls

There are numerous well-known traps and pitfalls in the way that C
implements unsigned integers.  These are mostly the result of C's
integer promotion rules, which need to be avoided.  Specifically,
comparison of signed vs. unsigned values can lead to confusion,
which can lead to hard-to-detect errors in the code, infinite
loops, and similar.

# 4. Prior art

At least one Fortran compiler, Sun Fortran, introduced unsigned ints.
Documentation can be found at [Oracle]
(https://docs.oracle.com/cd/E19205-01/819-5263/aevnb/index.html).
This proposal borrows heavily from that prior art, without sticking
to it in all details.  The discussion at the [Fortran proposals
site](https://github.com/j3-fortran/fortran_proposals/issues/2) also
influenced this proposal.

# 5. Proposal

## 5.1 General

- A type name tentatively called UNSIGNED, with the same KIND
  mechanism as for INTEGER, plus a SELECTED_UNSIGNED_KIND FUNCTIONS,
  to implement unsigned integers.

- Unsigned integer literal constants are marked with an U suffix,
  with an optional KIND number attached via the usual underscore.

- A conversion function UINT, with an optional KIND

- Binary operations between INTEGER and UNSIGNED are prohibited
  without explicit conversion, binary operations between UNSIGNED
  and REAL are permitted

- Unsigned integers should be permitted in a SELECT CASE

- Unsigned integers should not be permitted as index variables in a DO
  statement or array indices

- Unsigned integers can be be read or written in list-directed,
  namelist or unformatted I/O, and by using the usual edit
  descriptors such as I,B,O and Z

- Extension of ISO_C_BINDING with KIND numbers like C_UINT, C_UINT8_T
  etc.

- Likewise, ISO_Fortran_binding.h should be suitably extended.

- ISO_FORTRAN_ENV should be extended with KIND PARAMETERs like
  UINT8, UINT16 etc.

- Behavior on conversion to an integer value outside the range of
  the integer should be processor-dependent. For interoperable
  KINDs, the behavior should be identical to that of the
  companion C processor.

## 5.2 Behavior on overflow

In the discussion on github, two possible behaviors on overflow were
discussed: That this should be forbidden (using a "shall" directive)
and that this should wrap around.

The author of this proposal is of the opinion that wrap-around semantics
(modulo 2^n for an n-bit type) should be specified, for several reasons:

- It is required for several applications which would otherwise be left
  to C, such as cryptography, hashes and big-integer arithmetic

- The standard does not (up to now) mandate run-time checks, and an
  implementation which does not perform overflow checks would perform
    the same operation as with modulo 2^n arithmetic

- Writing checks for overflow with user code is relatively straightforward.
  For example,
```
    c = a + b
    if (c < a) then ! overflow has occurred
```
  but only possible if the operation to be checked is not, in fact,
  illegal.
  Over time, compilers will tend to remove such checks because they cannot
  be true because of the language definition (compare the removal of NULL
  pointer checks in C).

# 6. Relation to other proposals

This proposal complements the BITS proposal, J3/07-007r2.pdf, as
proposed in J3/22-195.txt.  BITS restricts its operations to logical
operations and comparisons on bit lengths, whereas this proposal is
for values requiring arithmetic operations, and is less flexible
in bit length.