J3/02-312

Date:     1 Nov 2002
To:       J3
From:     Richard Maine
Subject:  Module initialization

EXECUTIVE SUMMARY

The dependency reversal of submodules suggests to me some
powerful capabilities not mentioned in the modules TR draft.
Realization of these capabilities requires one additional
feature, which has been previously discussed in other contexts.
That feature is initializer procedures for modules.

What we can then get, in conjunction with type extension and
type-bound procedures, is the ability for a submodule to usefully
add type extensions without needing to modify the invoking
code in any way (not even recompiling it).

EXAMPLE OF A PROBLEM THIS CAN SOLVE

As an example (based on actual applications of mine), consider
applications that may want to read data from many different file
formats.  Input for each format requires a separate module (or
submodule as it will turn out) of procedures and associated data.
In the current operational code of the application I'm thinking
about, there are n+2 modules to support reading n file types.
Each file type has dedicated module.  There is a low-level
module, which has things that need to be accessible to all the
file type modules.  There is a top-level file input module, which
uses SELECT CASE to invoke procedures from the appropriate file
type modules.  Only the top-level file input module is used by
the main applications.  This is all in a library that is used by
many applications (everything at Dryden that accesses any of our
flight data).

A single application may be simultaneously accessing multiple
files using this library, so all state information about a
file is stored in a derived type structure, one of which is
allocated for each file.

Suppose that a user needs to work with a previously unsupported
file type.  This happens often, for various reasons.

That user needs to write the module for the file type in question.
He then needs to modify a copy of the top-level file-input
module to add appropriate USE statements and extend the SELECT
CASEs to incorporate the new file type.  The user then needs
to recompile and relink all applications that will use the new
file type.  This includes applications that the user didn't
write and doesn't normally have the source code for.

The capability we would like is for the user to be able to just
write the module for the new file type and incorporate it into
programs without recoding or recompiling anything else.

I've presented this particular example because it is a real
problem that I am personally familiar with, but I think this class
of problem generalizes to lots of scenarios of adding support for
a new specific type of object within an existing framework.

HOW TYPE_BOUND PROCEDURES PLUS SUBMODULES ALMOST SOLVE IT

Instead of using SELECT CASE to select the procedure for the
appropriate file format, we make our derived type for file
information, an extensible type.  We make the procedures to be
invoked type-bound.  Each file type module extends the type and
binds its procedures to the extended type.  The type extension
turns out to be independently useful anyway because each file
type has a different collection of state information specific
to that file type, in addition to the information common to
all types; this fits perfectly with type extension, which is
much cleaner than the awkward hacks used in the f90/f95 version.

The type-bound procedures get rid of the SELECT CASE statements.
With submodules for the file types, we get rid of the USE
statements for each file type in the top-level file input module.
We also merge the low-level module back into the top-level one,
where it more naturally goes; it was separated out only to avoid
circular USEs (top-level module uses file type module, which uses
top-level module), which no longer happen because the file type
modules can get what they need by host association.

If we have an object of the appropriate extended type for a
particular file type module, invoking the read procedure for that
object will get us to the submodule's read procedure without that
particular procedure or submodule having ever been known about in
the higher level code.

So we can almost do the job without modifying or recompiling the
top-level file input module, which in turn means that we won't
have to recompile all the applications.

ONE PIECE MISSING

If nothing outside of the submodule knows about its particular
type extension, how does an object of that extended type get
created in the first place?  We need a hook to bootstrap with.
But it doesn't need to be a very big hook - and it's one we can
also use for other purposes.

Suppose we have a procedure in the submodule that is executed to
initialize the submodule.  Though this particular application
only needs such a procedure for submodules, it makes sense to
generalize initialization procedures to modules as well.  Such
things have been proposed before, but fell off the train.

The submodule could have a procedure (not the initialization
procedure yet - we'll get to that), which is called to decide
whether a particular file is one it supports or not.  This
could be based on the file name, on a string naming the file
type, on examining the file contents, or on anything else;
the important thing is that it is a run-time determination.

The top-level module could have a list (linked list, array,
whatever) of pointers to these procedures, one for each file type.
WHen a new file is opened, the procedures in this list are
called until one of them recognizes a file type that it supports
(or none of them do).  The procedure that suceeds will create
the object.

The submodule initialization procedure would likely consist of
a single executable statement something like

   call add_to_file_type_recognition_list(proc_to_recognize_my_files)

where add_to_file_type_recognition_list is defined in the top-level
file input module.

THE RESULT

All the user has to do is write the new submodule, make sure it is
linked into each program, and support for the new file type will
be done.  Multiple custom file types can be added to a single
application.

This even opens the possibility that dynamic linking could be
used to load necessary support submodules during execution.  I
don't propose that we directly address that kind of issue in the
standard, but I do note that we'll have much of the underpinnings
necessary for it.  (Yes, that is useful and I have applications
that do it today, though it requires very nasty hackery.  I have
a server application that runs 24x7, restarting only when the
system goes down for maintenance or something of the sort a few
times a year; it regularly loads modules that weren't even yet
written when it started execution).

PROPOSAL

That the modules TR add a module/submodule initialization procedure,
which is invoked once to initialize the module/submodule.  I don't
particularly care about the order of invocation when there are
multiple modules/submodules with such procedures - probably simplest
to just let it be processor-dependent (otherwise that would turn into
by far the most complicated part of this proposal).

Van, if you want to use this as an excuse to also have the TR make all
module variables SAVEd, I think I can buy that.  Though it isn't an
integral part of this proposal, it would seem to go with the concept
that the module is initialized only once.  I don't think I want to go
into the complications involved if one thinks about a module being
unloaded and possibly needing reinitialization - the initialization
procedure ought to execute only once in any case.  If it initializes
things that need saving (which it presumably will), then those things
had just better be SAVED, either by user declaration or by being
implicitly SAVEd.  I think that's a lot simpler than getting into the
possibility of reinitialization on reloading of a module.