To: J3 J3/19-133 From: Tom Clune & Dan Nagle & Steve Lionel Subject: Use cases for exception handling Date: 2019-February-08 Introduction ============ This paper presents a set of closely related use cases associated with managing exceptional situations that arise in Fortran codes. These scenarios are of particular concern for larger, more complex applications, but are relevant even in relatively simple applications. By "exceptional" we generally mean states that invalidate the usual/intended logic of an algorithm. Such situations can be arise from lack of available resources (open, allocate, etc.), illegal values for an operation (cannot divide by zero, invalid communicator in MPI call, etc.), as well as other scenarios. Generally these use cases consist of two interacting software parts. One part _identifies_ the exceptional situation and reports it by some mechanism, while the other part branches depending on the existence of the exceptional situation. We refer to this latter aspect as "error handling" or "exception handling". For each use case below, we provide a semi-concrete example, describe common practice, and then discuss some of the deficiencies in the common practice that might be ameliorated by appropriate extensions to Fortran. Use Case 1: Checking return codes from library calls. ====================================================== The typical scenario here is that a procedure invocation returns an integer value, often called a return code, that indicates success/failure of the invocation. Often the return code is the last argument to the subroutine, but for some libraries it is implemented as the return value of a function. Closely related variants are Fortran executable statements that return an integer status for exceptional conditions. E.g., IOSTAT for the OPEN statement. While the implementation could ignore these return codes in the optimistic hope that the exceptional cases never happen, such an approach is generally untenable in larger projects. To allow for failures, robust client code must always check all return codes and take appropriate actions based on their values. Almost universally, a value of 0 is used to indicate success, with all other values indicating various types of failures. Depending on the specific scenario, all failure modes may be handled by a single branch, or each may have its own custom error handling logic. A simple concrete example of error handling would be to execute a PRINT statement and then return in the case of a failing return code: INTEGER :: status CALL foo(x, y, rc=status) IF (status /= 0) THEN PRINT*,'Error calling foo(): return code was ', status RETURN END IF Quite often library interactions involve a sequence of several similar procedure calls each of which requires nearly identical error handling. Further, in complex applications with multiple layers, the return codes must usually be returned up through multiple software layers with checks at each and every stage. 1.2 Deficiencies of common practice Deficiency 1: In higher layers of a complex code, error handling logic can often dwarf the primary logic. Even in the very simple case given above, the error handling involved 5 lines of code (including the declaration of the return code variable), while the library call was a single line. The error handling logic therefore dominates the implementation and interferes with the ability of subsequent developers to follow the intended logic. Deficiency 2: Often the appropriate location to handle an unexpected failure is at a much higher level in the call tree. If a failure condition is detected in a low-level call, each intervening procedure must inject a tedious bit of logic to re-detect the failure condition, pass the condition up to its caller and immediately RETURN. Much better would be if the program could directly jump to error-handling logic; this logic can decide to continue execution (perhaps with some sort of "fix-up" or report of the condition, return from the current procedure level (perhaps with a defined return value), or terminate execution. The logic might also decide that it doesn't want to handle the exception and wants "another level" to take a turn at handling it (nested exception handling.) Deficiency 3: The duplication of identical or nearly identical logic in error-handling leads to maintenance difficulties. When a decision is made to change the error handling approach, numerous changes are required throughout the application. ("Shotgun surgery" in the parlance of Code Smells.) For example, a project may decide to start reporting errors to ERROR_UNIT, or the file and line number where the problem occurred. Deficiency 4: Inconsistent approaches to returning errors. As mentioned above, multiple conventions exist for returning failure states: optional and nonoptional subroutine arguments, function return variables, side-effects, etc. This lack of consistency is encouraged by the lack of guidance from the Fortran standard. If all software implementations use an exception mechanism provided by the standard itself, it would encourage a more uniform and ultimately more portable approach to error handling. Deficiency 5: Integers provide only a very weak description of an error condition. Here there are many aspects to consider. An integer is limited in the amount of information it can encode. Integers are opaque. Any specific value has no natural association with a particular exception. When multiple independent packages are exercise within a single system, return code values may not be unique. If the return codes are passed up the calling tree, it may not be possible to determine the specific type of error. There is no standard mechanism to determine the set of allowed return codes for a given interface. With regard to opacity, in the best scenario a library provides an auxiliary routine that developers can call to convert a returned error code into an error message. Some packages at least provide named constants for return codes that allow developers to more clearly label the the various error handling branches. For example: CALL foo(x, y, rc=status) SELECT VALUE (status) CASE (ERR_USE_ALT_INTERFACE) CALL foo2(x, y, z, rc=status) CASE (ERR_RESOURCE_NOT_AVAILALE) ! give up PRINT*,'foo() failed with return', status END SELECT Even then, developers must first consult the packages documentation to determine the names of the potential return codes. And of course some packages fail to document their return codes at all or even treat them inconsistently from procedure to procedure. Ideally, there would be a standardized approach for an interface to indicate the potential exceptions that may arise in the use of that interface. Further there would be a mechanism for a package to ensure that its exceptions are unique. And finally, on exception should be able to encode additional metadata about the failure in a non-opaque manner. Use Case 2: Software testing frameworks ======================================= In this use case, a developer wishes to exercise a suite of tests, each of which exercises some aspect of an application. The typical structure of each test is the following: SUBROUTINE test() CALL system_under_test(...) END SUBROUTINE test This aspect alone often gives rise to many of the deficiencies described in Use Case 1 above. But then consider the testing framework itself, which then exercises the suite of tests in a manner similar to the following: PROCEDURE(), POINTER :: p_test DO i = 1, num_tests CALL test_suite%get_ith_test(i, p_test) CALL p_test() END DO The key issue here is how to implement the chunk in the above snippet that will work (and be useful) for a wide variety of underlying implementation styles and across completely independent applications. Multiple software layers with different error handling mechanisms may be involved. Existing practice is to either use a passed return code (ala Use Case 1 above), or to use a global variable such as a table of error message strings. Deficiency 1: The lack of a standard mechanism to indicate the existence of an exceptional condition and to describe the exception in a humanly readable form. The framework cannot sensibly convert a failure return code into any humanly useful description because it does not know about the underlying libraries being used. Deficiency 2: By design, error handling here is handled at the highest level. The lack of a standard mechanism to jump to this error handling when lower layers of the system under test encounter exceptions forces testable applications to either saturate each level of the software tree with error handling logic. Unlike use case 1 above, stopping execution is not an option in this context. Use Case 3: Software contracts =============================== In this use case, the implementation of a set of related interfaces provides preconditions, postconditions and invariants for those interfaces. Preconditions guarantee that input arguments satisfy certain conditions, while postconditions verify that the output arguments satisfy other conditions . Invariants are conditions that are expected to hold at every entry/exit and may include internal state. Contract verification is expected to be disabled for performance in production code, but activated during routine development. While Fortran does not (yet) support software contracts, it is conceivable that any language specification for contracts would be closely related to the specification for exceptions. In that regard, this use case should at least be kept in mind when considering potential designs for exceptions. Use Case 4: Processor generated exceptions =========================================== Here we consider failure modes that are directly detected by the processor. Examples include overflow, divide by zero, out-of-bounds, etc. It would be desirable to have a mechanism to allow such failures to be treated in a manner consistent with the treatment of user-defined exceptions as addressed in the previous use-cases. Performance considerations and backwards compatibility would understandably prevent this from being default behavior, but a mechanism could be provided to allow a given interface to "throw an exception" when these conditions arise rather than halting execution. An error-handling block should be able to have a branch that handles a divide-by-zero exception alongside a branch that handles an invalid procedure parameter exception. Alternatively, a single branch should be able to handle all exceptions including processor generated ones.