To: J3 07-153 From: Rich Bleikamp Subject: web site uploading Date: 2007 January 31 The current paper # assignment and upload capability is designed to ensure a text version of a new paper is available to upload before assigning a new paper #. This helps reduce mistakes in paper # assignment, and ensures a text file is provided for each paper as our rules require. As part of the upload process for a new (r0) text paper, certain validity checks are performed, the paper # is assigned and is written into the text file, and the paper179.txt file (paper179.txt is used as a placeholder for the next meeting paperxxx.txt file throughtout this document) is updated automatically, relieving the librarian of this tedious task. The author and subject are extracted from the text file uploaded. A very simple interface, easy to use, as long as the text file meets the documented requirements. What this scheme doesn't do as well at is uploading revisions and standing documents. While they can be uploaded, using an alternative script, the same level of automation is not available. The paper179.txt file is not automatically updated, and no author/subject information is available. Some of these shortcomings can be addressed by asking the user to provide the author/subject information when uploading a non-text file, and updating the paper179.txt file automatically with that information. Other shortcomings can be addressed by enhancing the Fortran program Dan wrote that handles paper #s and the paper179.txt file updating. The other potential problem with this scheme is that it is very dependent on the exact format of the paper179.txt file, and does not track/validate revisions well (could be addressed, with difficulty). - - - - Alternative approach: I'm suggesting a different design, that solves some problems with the above approach, at the expense of making sure a paper # will be used once assigned. 1) Provide a "get paper number" web page, which asks for the author's name and Subject, and returns a paper #. The last paper # assigned would be kept in a configuration file, not in the paper179.txt file. The paper179.txt file would show a status of pending for this paper. 2) Provide an upload page, that requires a file named as per J3 naming conventions, and uploads that file, updating the paper179.txt file appropriately. I personally think this is a big advantage of this approach. All papers will be named correctly on the users computer, and will help avoid paper # mixups. This upload page would (in addition to uploading the document): - for an r0 paper text paper, change the status in the paper179.txt file from pending to text (a text file would still have to be uploaded first, except for standing documents). - for a revision or non-text file upload, allow the user to enter the Author and Subject if they have changed (using the previously supplied values otherwise), and updating the paper179.txt file. NOTE that the user supplied values for subject and author might not match whats in the actual file, if the user makes a mistake. - for standing papers, allow uploads of any revision in any format, that way, r0 could have text and pdf, while r1 coould have HTML and PDF only, ... (we do this now, for some papers). Also update the paper179.txt file. Also update the current standing documents link in j3-fortran.org/doc/standing/links/... - enforce any restrictions we decide are appropriate, such as: + require a text file to be uploaded before other formats, both for r0 and revision papers (except standing documents) + require user confirmation before uploading 'old' revisions in a new format, after uploading a newer revision in any format (i.e. if 07-013r2.pdf is uploaded after 07-013r1.txt, then trying to upload 07-013r1.pdf would require explicit user confirmation) + disallow overwriting any paper already on the server (we do this now), but we could provide a special override capability if necessary, possibly password protected. + we could check the Author / Subject provided via the web form against what's in a text paper, or just always extract the Author/Subject from a text paper like Dan does now. Other comments on this alternative scheme: It is easy for the upload script to check that the file name (the filename up to the ".") matches exactly the paper # in a text file, if the paper # is on the first line or two, in the prescribed format. No such check is possible for non-text files. I think we need to live with such mistakes, or require the user who made the error to fix it. Burdening the librarian with this task is no longer viable. The actual list of papers for a given year would be kept in a flat non-formatted file, one easy to manipulate with the scripting language used. The actual paperxxx.txt file would be regenerated from the database everytime a paper is uploaded. This seems less fragile than using the paperxxx.txt file as the paper # database. A simple configuration file would keep the 'current' meeting number and year information, used for uploading new papers, and would need updating once after each meeting. A script would be available to edit the config file, and prepare new directories for new years and new meeting #s. All papers for the 'last' meeting would need to have numbers assigned before updating the config file for the next meeting/year, but those old papers (such as the meeting minutes and treasurers report) could be uploaded anytime (including revisions thereto). An alternative get paper # scipt could be provided to allow working with 'old' meetings if needed. The user would have to provide the year/meeting# information explicitly. This scheme would eliminate Dan's fortran program, which simplifys some aspects of the design, but will require someone else to become at least a little proficient using PHP (needs to happen anyways). I didn't know any PHP before starting this effort, and I think anyone could pick it up fairly easily, if they want to. Other scripting languages could be used instead if necessary. I'm willing to support the committee and these scripts thru Jan 2008, whatever approach we decide on. I guess I've combined the notion of a separate "get paper number" script with the suggestion to not use the paperxxx.txt as the paper # database. These are seperable. I do think implementing the whole system in one language (has to be a web scripting language) is a good idea, but isn't really needed to get the desired functionality. rich