To: J3					07-153
From: Rich Bleikamp
Subject: web site uploading
Date: 2007 January 31

The current paper # assignment and upload capability is designed to
ensure a text version of a new paper is available to upload before
assigning a new paper #.  This helps reduce mistakes in paper #
assignment, and ensures a text file is provided for each paper as our
rules require.

As part of the upload process for a new (r0) text paper, certain
validity checks are performed, the paper # is assigned and is written
into the text file, and the paper179.txt file (paper179.txt is
used as a placeholder for the next meeting paperxxx.txt file
throughtout this document) is updated automatically, relieving the
librarian of this tedious task.  The author and subject are extracted
from the text file uploaded.  A very simple interface, easy to use,
as long as the text file meets the documented requirements.

What this scheme doesn't do as well at is uploading revisions and
standing documents.  While they can be uploaded, using an alternative
script, the same level of automation is not available.  The paper179.txt
file is not automatically updated, and no author/subject information
is available.  Some of these shortcomings can be addressed by asking
the user to provide the author/subject information when uploading
a non-text file, and updating the paper179.txt file automatically
with that information.  Other shortcomings can be addressed by
enhancing the Fortran program Dan wrote that handles paper #s and
the paper179.txt file updating.

The other potential problem with this scheme is that it is very
dependent on the exact format of the paper179.txt file, and
does not track/validate revisions well (could be addressed,
with difficulty).
- - - -
Alternative approach:

I'm suggesting a different design, that solves some problems with the
above approach, at the expense of making sure a paper # will be used
once assigned.

1) Provide a "get paper number" web page, which asks for the author's
   name and Subject, and returns a paper #.  The last paper # assigned
   would be kept in a configuration file, not in the paper179.txt file.
   The paper179.txt file would show a status of pending for this paper.

2) Provide an upload page, that requires a file named as per J3 naming
   conventions, and uploads that file, updating the paper179.txt file
   appropriately.  I personally think this is a big advantage of
   this approach.  All papers will be named correctly on the users
   computer, and will help avoid paper # mixups.

   This upload page would (in addition to uploading the document):

     - for an r0 paper text paper, change the status in the paper179.txt
       file from pending to text (a text file would still have to be
       uploaded first, except for standing documents).

     - for a revision or non-text file upload, allow the user to enter
       the Author and Subject if they have changed (using the previously
       supplied values otherwise), and updating the paper179.txt file.
       NOTE that the user supplied values for subject and author might
       not match whats in the actual file, if the user makes a mistake.

     - for standing papers, allow uploads of any revision in any format,
       that way, r0 could have text and pdf, while r1 coould have HTML
       and PDF only, ...  (we do this now, for some papers).  Also
       update the paper179.txt file.  Also update the current
       standing documents link in j3-fortran.org/doc/standing/links/...

     - enforce any restrictions we decide are appropriate, such as:

         + require a text file to be uploaded before other formats,
           both for r0 and revision papers (except standing documents)

         + require user confirmation before uploading 'old' revisions
           in a new format, after uploading a newer revision in any
           format (i.e. if 07-013r2.pdf is uploaded after 07-013r1.txt,
           then trying to upload 07-013r1.pdf would require explicit user
           confirmation)

         + disallow overwriting any paper already on the server (we do
           this now), but we could provide a special override capability
           if necessary, possibly password protected.

         + we could check the Author / Subject provided via the web form
           against what's in a text paper, or just always extract the
           Author/Subject from a text paper like Dan does now.


Other comments on this alternative scheme:

   It is easy for the upload script to check that the file name
   (the filename up to the ".") matches exactly the paper # in a text
   file, if the paper # is on the first line or two, in the prescribed
   format.  No such check is possible for non-text files.  I think we need
   to live with such mistakes, or require the user who made the error to
   fix it.  Burdening the librarian with this task is no longer viable.

   The actual list of papers for a given year would be kept in a flat
   non-formatted file, one easy to manipulate with the scripting
   language used.  The actual paperxxx.txt file would be regenerated
   from the database everytime a paper is uploaded.  This seems less
   fragile than using the paperxxx.txt file as the paper # database.

   A simple configuration file would keep the 'current' meeting number
   and year information, used for uploading new papers, and would need
   updating once after each meeting.  A script would be available to
   edit the config file, and prepare new directories for new years and
   new meeting #s.

   All papers for the 'last' meeting would need to have numbers assigned
   before updating the config file for the next meeting/year, but those
   old papers (such as the meeting minutes and treasurers report) could
   be uploaded anytime (including revisions thereto).   An alternative
   get paper # scipt could be provided to allow working with 'old'
   meetings if needed.  The user would have to provide the year/meeting#
   information explicitly.

   This scheme would eliminate Dan's fortran program, which simplifys
   some aspects of the design, but will require someone else to become
   at least a little proficient using PHP (needs to happen anyways).
   I didn't know any PHP before starting this effort, and I think anyone
   could pick it up fairly easily, if they want to.  Other scripting
   languages could be used instead if necessary.

I'm willing to support the committee and these scripts thru Jan 2008,
whatever approach we decide on.

I guess I've combined the notion of a separate "get paper number"
script with the suggestion to not use the paperxxx.txt as the paper
# database.  These are seperable.  I do think implementing the whole
system in one language (has to be a web scripting language) is a
good idea, but isn't really needed to get the desired functionality.

rich