Monday, September 27, 2010

Mandaic Book of John Project Software

Today I am beginning work on the software to aid translating the Mandaic Book of John, and am using this blog post as a means to get my thoughts and design goals in order.

The original thought was using a collaborative wiki-like environment to manipulate and edit the content which can store Mandaic text in a standard transliteration for editing, but otherwise output it as images (or whatever other format we need) for viewing.

From that basic idea, for this to be a successful and useful tool, I feel it will need:
  1. A means to store each manuscript we're working from, separately.
  2. A means to divvy up each manuscript into workable "chunks."
  3. A way to display each parallel chunk of the manuscript side-by-side for comparison purposes.
  4. A means to annotate the Mandaic texts with footnotes, cross-references, and lexical tags.
  5. A means to display the Mandaic text as images, unicode (Hebrew, Syriac, and Mandaic if it goes through) or other encodings.
  6. A place where all members of the project can access and contribute to the English translation and tag the Mandaic texts.
  7. A means to annotate the *English* text with footnotes, cross-references, and lexical tags.
  8. A way that all changes are tracked and can be rolled back or reviewed.
Some other features to implement once there is enough data in the system to experiment with would be:
  1. Automatic export features to a variety of formats (.doc, .pdf, LATEX, etc.)
  2. Automatic concordance and lexicon/dictionary generation.
  3. Adding in hooks for standard Natural Language Processing (NLP) libraries.
Right now I'm going to see about sketching out some potential workflows and database models and get a "toy" version working to play with.

Peace,
-Steve

2 comments:

  1. You should pay real close attention to be able to create a good electronic transcription of each manuscript. That includes special tagging for unreadable letters/words, corrections, physically damaged parts etc.

    If your manuscripts include/exclude diacritical signs/punctuation/vocalisation or the like, make sure that you know in advance which of the features you want to have represented in your transcriptions and how to encode them.

    In addition, you may need special encodings for representing the lay out (line breaks, column breaks, etc.)

    Keep in mind that transcriptions, in order to be performed successfully, will need guidelines that need to spell out all (most of) the phenomena that can occur in a manuscript and how to meet them in the transcription.

    Good luck! This is a challenge, if you want to make full use of the electronic medium rather than rushing to a print edition.

    ReplyDelete
  2. Some very good points, Ulrich (and points that I have been chewing on myself for a while now)! :-)

    The final markup schema I feel will be developed in the most detail when everyone on the project has had the time to scan over the manuscripts and determine what information we need to keep, and what information would be best to leave out.

    I've always been a "preserver" when it comes to details, trying to take full advantage of encoding as many details as possible, for one never knows when they may come in useful.

    For example, in one of the manuscripts we're using, there are a lot of marginal and supralinear notes in French and Latin that I would love to encode as well, but many of them (given the quality of the scan) are very difficult to make out, and we may not have the time or the resources to preserve them. (Which saddens me, but we do have to be practical in some respects.)

    In any direction, one feature that I am becoming more resolved to include is linking up each chunk of text to its image in the manuscripts in question. That way, if there is any ambiguity, one can go right to the actual manuscript and, if necessary, make appropriate edits.

    Thanks for your good wishes. Let's see what can be done! :-)

    Peace,
    -Steve

    ReplyDelete

There are several rules about commenting here:

1) All unsigned/anonymous comments will be temp-deleted. I would like the actual names of the people who comment here.

2) SPAM will be deleted outright and permanently.

3) If someone is obnoxious, I will temp-delete their comments until they become more civil.

4) By commenting here, you release the copyrights of your comments to me.

Other than that, have fun. :-)