To: J3                                                     J3/19-133
From: Tom Clune & Dan Nagle & Steve Lionel
Subject: Use cases for exception handling
Date: 2019-February-08


Introduction
============

This paper presents a set of closely related use cases associated with
managing exceptional situations that arise in Fortran codes.  These
scenarios are of particular concern for larger, more complex
applications, but are relevant even in relatively simple applications.
By "exceptional" we generally mean states that invalidate the
usual/intended logic of an algorithm.  Such situations can be arise
from lack of available resources (open, allocate, etc.), illegal
values for an operation (cannot divide by zero, invalid communicator
in MPI call, etc.), as well as other scenarios.

Generally these use cases consist of two interacting software parts.
One part _identifies_ the exceptional situation and reports it by some
mechanism, while the other part branches depending on the existence of
the exceptional situation.  We refer to this latter aspect as "error
handling" or "exception handling".

For each use case below, we provide a semi-concrete example, describe
common practice, and then discuss some of the deficiencies in the
common practice that might be ameliorated by appropriate extensions to
Fortran.


Use Case 1:  Checking return codes from library calls.
======================================================

The typical scenario here is that a procedure invocation returns an
integer value, often called a return code, that indicates
success/failure of the invocation.  Often the return code is the last
argument to the subroutine, but for some libraries it is implemented
as the return value of a function.  Closely related variants are
Fortran executable statements that return an integer status for
exceptional conditions.  E.g., IOSTAT for the OPEN statement.  While
the implementation could ignore these return codes in the optimistic
hope that the exceptional cases never happen, such an approach is
generally untenable in larger projects.

To allow for failures, robust client code must always check all return
codes and take appropriate actions based on their values.  Almost
universally, a value of 0 is used to indicate success, with all other
values indicating various types of failures.  Depending on the
specific scenario, all failure modes may be handled by a single
branch, or each may have its own custom error handling logic.

A simple concrete example of error handling would be to execute a
PRINT statement and then return in the case of a failing return code:

   INTEGER :: status
   CALL foo(x, y, rc=status)
   IF (status /= 0) THEN
      PRINT*,'Error calling foo():  return code was ', status
      RETURN
   END IF

Quite often library interactions involve a sequence of several similar
procedure calls each of which requires nearly identical error
handling.  Further, in complex applications with multiple layers, the
return codes must usually be returned up through multiple software
layers with checks at each and every stage.


1.2  Deficiencies of common practice

Deficiency 1: In higher layers of a complex code, error handling logic
can often dwarf the primary logic.  Even in the very simple case given
above, the error handling involved 5 lines of code (including the
declaration of the return code variable), while the library call was a
single line.  The error handling logic therefore dominates the
implementation and interferes with the ability of subsequent
developers to follow the intended logic.

Deficiency 2: Often the appropriate location to handle an unexpected
failure is at a much higher level in the call tree.  If a failure
condition is detected in a low-level call, each intervening procedure
must inject a tedious bit of logic to re-detect the failure condition,
pass the condition up to its caller and immediately RETURN.  Much
better would be if the program could directly jump to error-handling
logic; this logic can decide to continue execution (perhaps with some
sort of "fix-up" or report of the condition, return from the current
procedure level (perhaps with a defined return value), or terminate
execution. The logic might also decide that it doesn't want to handle
the exception and wants "another level" to take a turn at handling it
(nested exception handling.)


Deficiency 3: The duplication of identical or nearly identical logic
in error-handling leads to maintenance difficulties.  When a decision
is made to change the error handling approach, numerous changes are
required throughout the application.  ("Shotgun surgery" in the
parlance of Code Smells.)  For example, a project may decide to start
reporting errors to ERROR_UNIT, or the file and line number where the
problem occurred.


Deficiency 4: Inconsistent approaches to returning errors.  As
mentioned above, multiple conventions exist for returning failure
states: optional and nonoptional subroutine arguments, function return
variables, side-effects, etc.  This lack of consistency is encouraged
by the lack of guidance from the Fortran standard.  If all software
implementations use an exception mechanism provided by the standard
itself, it would encourage a more uniform and ultimately more portable
approach to error handling.

Deficiency 5: Integers provide only a very weak description of an
error condition.  Here there are many aspects to consider.  An integer
is limited in the amount of information it can encode.  Integers are
opaque.  Any specific value has no natural association with a
particular exception.  When multiple independent packages are exercise
within a single system, return code values may not be unique.  If the
return codes are passed up the calling tree, it may not be possible to
determine the specific type of error.  There is no standard mechanism
to determine the set of allowed return codes for a given interface.

With regard to opacity, in the best scenario a library provides an
auxiliary routine that developers can call to convert a returned error
code into an error message.  Some packages at least provide named
constants for return codes that allow developers to more clearly label
the the various error handling branches.  For example:

   CALL foo(x, y, rc=status)
   SELECT VALUE (status)
   CASE (ERR_USE_ALT_INTERFACE)
      CALL foo2(x, y, z, rc=status)
   CASE (ERR_RESOURCE_NOT_AVAILALE)
    ! give up
      PRINT*,'foo() failed with return', status
   END SELECT

Even then, developers must first consult the packages documentation to
determine the names of the potential return codes.  And of course some
packages fail to document their return codes at all or even treat them
inconsistently from procedure to procedure.

Ideally, there would be a standardized approach for an interface to
indicate the potential exceptions that may arise in the use of that
interface.  Further there would be a mechanism for a package to ensure
that its exceptions are unique.  And finally, on exception should be
able to encode additional metadata about the failure in a non-opaque
manner.


Use Case 2: Software testing frameworks
=======================================

In this use case, a developer wishes to exercise a suite of tests,
each of which exercises some aspect of an application.  The typical
structure of each test is the following:

   SUBROUTINE test()
     <declare variables>
     <set initial values>
     CALL system_under_test(...)
     <compare outputs with expected values>
   END SUBROUTINE test

This aspect alone often gives rise to many of the deficiencies
described in Use Case 1 above.  But then consider the testing
framework itself, which then exercises the suite of tests in a manner
similar to the following:

   PROCEDURE(), POINTER :: p_test

   DO i = 1, num_tests
      CALL test_suite%get_ith_test(i, p_test)
      CALL p_test()
      <detect and report failures>
   END DO

The key issue here is how to implement the <detect and report
failures> chunk in the above snippet that will work (and be useful)
for a wide variety of underlying implementation styles and across
completely independent applications.  Multiple software layers with
different error handling mechanisms may be involved.  Existing
practice is to either use a passed return code (ala Use Case 1 above),
or to use a global variable such as a table of error message strings.

Deficiency 1: The lack of a standard mechanism to indicate the
existence of an exceptional condition and to describe the exception in
a humanly readable form.  The framework cannot sensibly convert a
failure return code into any humanly useful description because it
does not know about the underlying libraries being used.

Deficiency 2: By design, error handling here is handled at the highest
level.  The lack of a standard mechanism to jump to this error
handling when lower layers of the system under test encounter
exceptions forces testable applications to either saturate each level
of the software tree with error handling logic.  Unlike use case 1
above, stopping execution is not an option in this context.


Use Case 3:  Software contracts
===============================

In this use case, the implementation of a set of related interfaces
provides preconditions, postconditions and invariants for those
interfaces.  Preconditions guarantee that input arguments satisfy
certain conditions, while postconditions verify that the output
arguments satisfy other conditions .  Invariants are conditions that
are expected to hold at every entry/exit and may include internal
state.  Contract verification is expected to be disabled for
performance in production code, but activated during routine
development.

While Fortran does not (yet) support software contracts, it is
conceivable that any language specification for contracts would be
closely related to the specification for exceptions.  In that regard,
this use case should at least be kept in mind when considering
potential designs for exceptions.


Use Case 4:  Processor generated exceptions
===========================================

Here we consider failure modes that are directly detected by the
processor.  Examples include overflow, divide by zero, out-of-bounds,
etc.

It would be desirable to have a mechanism to allow such failures to be
treated in a manner consistent with the treatment of user-defined
exceptions as addressed in the previous use-cases.  Performance
considerations and backwards compatibility would understandably
prevent this from being default behavior, but a mechanism could be
provided to allow a given interface to "throw an exception" when these
conditions arise rather than halting execution.

An error-handling block should be able to have a branch that handles a
divide-by-zero exception alongside a branch that handles an invalid
procedure parameter exception.  Alternatively, a single branch should
be able to handle all exceptions including processor generated ones.