Loading ...
Sorry, an error occurred while loading the content.

tc-list New ENTMP

Expand Messages
  • Timothy John Finney
    Wieland Wilker said ... I agree. Here is an idea I had along these lines. If we all agreed on the basic text of a MS (filling in lacunae with what we think was
    Message 1 of 4 , Aug 2 4:13 AM
    • 0 Attachment
      Wieland Wilker said

      > My only wish is that there should be one section in which ALL the
      > available manuscripts are in plain ASCII format.

      I agree.

      Here is an idea I had along these lines. If we all agreed on the basic
      text of a MS (filling in lacunae with what we think was probably there)
      and wrote these letters down as a column, then we could add in whatever
      other information we wanted by filling in adjacent columns:

      Letter Read? N. S. Abbrev.
      O Y N N
      _
      Q Y Y N
      E N N Y
      O N N Y
      S Y Y N
      _

      Some explanation... _ = a word break; Y = yes; N = no; Read? = Can it be
      read?; N. S. = Is there a nomen sacrum superscript above the letter?;
      Abbrev. = Has this letter been subsumed in an abbreviation?

      I'm only talking concepts here. Once the first column (the ASCII text) is
      established, others can add to the parallel columns as they wish. That's
      why I call this parallel mark-up. If someone comes along with another
      feature that they want to include (e.g. accents) they simply add another
      column. With this approach, different people can work on different aspects
      of the text simultaneously. As they do so, the agreed upon text is double,
      triple n-tuple checked. If an error turns up, the row in question has to
      be retranscribed for every column. Believe me, it is much simpler to
      process text that is in this format. In fact, I ended up writing programs
      that put my in-line mark-up transcriptions into this format so that I
      could write a straight-forward program to collate the transcriptions.

      Bob Waltz said

      > But I would argue that the first
      > step is to decide what information we need in a collation.

      I recommend the inclusion of spelling variations. My research indicates
      that MS spellings say as much as textual variations when it comes to
      grouping MSS. A big problem is regularisation, whereby spelling variation
      has to be standardised before collation. It seems to me that the only safe
      way to do regularisation in a highly inflected language like Greek is to
      specify the standard spelling of every single word in the MS. Global
      replacement is not safe because one scribe's spelling variation can be
      another scribe's textual variation.

      Mike Bossingham said

      > What is really needed is not a common data base - these are
      > transient and come and go. But a means of sending transciptions
      > in plain ASCII around the net - so that data can be built into
      > the different data bases and used.

      For what it's worth, I agree. All we need to agree upon is the text of the
      manuscript. Mark-up can then be added by whoever wants to, for whatever
      reason (see Dave Wasburn's comment below and the above idea). Of course,
      we would want some quality control.

      Larry Hurtado said

      > Electronic data is ephemeral enough, and people come &
      > go, lose interest, get sick, etc.; we need a good institutional base
      > for the work.

      I heartily agree with the first clause, except to say that electronic data
      need not be ephemeral. A plain ASCII transcription of a manuscript's text
      need not be ephemeral at all -- letters are letters and you can either
      read them or not. Even a group of scholars can agree on the text of a
      manuscript when confronted with a good facsimile. I have some reservations
      about the second clause. I would say that we need a good standard approach
      as the basis for our work. Take the Internet as an example. It is built
      upon standards (ISO, IEC, IEEE, etc). Many of these standards are pretty
      much what the industry had already invented in order to get a job done.
      Whether or not a committee is involved may be irrelevant. There was only
      one Tim Berners-Lee needed to get WWW off the ground.

      Dave Washburn said

      > As soon as someone can tell me that I won't get sued for it, I can
      > post these to a temporary web site. Since the transcriptions are
      > mine, of course, I can post them with no problem, I refer to the
      > images.

      I have always been worried about copyright of images. I don't know whether
      this is paranoia or fear of getting manuscript custodians off side. How
      about if we put the images up, then one of us takes on the job of writing
      to the relevant custodian and asking for permission. That way we are doing
      the right thing by them (asking permission) and getting the images up
      straight away. They can only say no, at which point we could politely ask
      them to put the image up for us or just have a notice saying that the
      image cannot be displayed due to the custodian being a bad sport.

      > Are those [images] the ones that were on the original ENTMP? I may have
      > some of those around here...I had 0166, 071, I think P22 and
      > perhaps one other, I don't remember which.

      They are some of the ones that were on the original ENTMP. I have a few
      others as well. What I would really like to see is for the custodians to
      put their treasures on the Web. Have you seen the Oxyrhynchus site? They
      already have images of papyri there. (Go to http://www.csad.ox.ac.uk/)

      > I like Wieland's idea of going with ASCII (and I will push
      > again for CCAT mapping) to begin with, that way anybody who
      > wants to can take them and do the markup. If we do the bare
      > transcriptions in a fairly unencumbered format, markup becomes a
      > matter of retrofitting which should be easier.

      Yes, yes, yes. My only complaint about CCAT (and Beta code) is the
      transcription of xi with C and chi with X. In Athens, taxi is spelt with
      xi, not chi. Oh well.

      Jim West said

      > I dont recall which mss I transcribed. The low P's I think.

      Jim, I still have your transcriptions of P1, P2, P3, P4 and P9.



      Sorry for this gargantuan post.

      Tim Finney.
    • Dave Washburn
      ... The transcriptions I did didn t have any real way to indicate abbreviations (I assume you mean little ligatures like the trailing-off line on kappa at the
      Message 2 of 4 , Aug 2 9:39 AM
      • 0 Attachment
        Tim wrote:
        > Wieland Wilker said
        >
        > > My only wish is that there should be one section in which ALL the
        > > available manuscripts are in plain ASCII format.
        >
        > I agree.
        >
        > Here is an idea I had along these lines. If we all agreed on the basic
        > text of a MS (filling in lacunae with what we think was probably there)
        > and wrote these letters down as a column, then we could add in whatever
        > other information we wanted by filling in adjacent columns:
        >
        > Letter Read? N. S. Abbrev.
        > O Y N N
        > _
        > Q Y Y N
        > E N N Y
        > O N N Y
        > S Y Y N
        > _
        >
        > Some explanation... _ = a word break; Y = yes; N = no; Read? = Can it be
        > read?; N. S. = Is there a nomen sacrum superscript above the letter?;
        > Abbrev. = Has this letter been subsumed in an abbreviation?
        >
        > I'm only talking concepts here. Once the first column (the ASCII text) is
        > established, others can add to the parallel columns as they wish. That's
        > why I call this parallel mark-up. If someone comes along with another
        > feature that they want to include (e.g. accents) they simply add another
        > column. With this approach, different people can work on different aspects
        > of the text simultaneously. As they do so, the agreed upon text is double,
        > triple n-tuple checked. If an error turns up, the row in question has to
        > be retranscribed for every column. Believe me, it is much simpler to
        > process text that is in this format. In fact, I ended up writing programs
        > that put my in-line mark-up transcriptions into this format so that I
        > could write a straight-forward program to collate the transcriptions.

        The transcriptions I did didn't have any real way to indicate
        abbreviations (I assume you mean little ligatures like the trailing-off
        line on kappa at the end of a line to indicate KAI, that sort of
        thing?), but under James' direction I used brackets to indicate
        uncertain or supplied letters, and I believe we were playing with a
        pseudo-HTML tag <BAR></BAR> to indicate NS. I would tend to
        prefer something like this that keeps the format of running text,
        simply because later on it should make it easier to mark up (then
        again, I may be wrong about that). Here's the preliminary
        transcription of 071 that I did lo, those many ages ago:

        -------
        Manuscript: 071
        Folio: 1
        Side: Verso
        Transcriber: DLW
        Supplied text source: TR

        1: [unreadable]
        2: [EKENUION]KAI[EK]ALES[EN]
        3: [TOONO]MAAUTOUINTOUDe
        4: iUGENNHQENtOIENBEi
        5: QLEEMGHSIOUDAIASENH
        6: MERAISHRWDOUTOUBASI
        7: LEWSIDOUMAGOIaPOANA
        8: tOLWN[PAR]EgeNONTOEIs
        9: iEROSOLUMALEGONTE[CPOU]
        10: ESTINOT[ei/h?]QCEISBA[SILEUS]
        11: [T]WNiOUDAIWN[EIDOMEN]
        12: [G]aRAUTOUTO[NASTERAENT]
        13: [HA]nA[TOLHKAIHLQOMEN]

        Notes:

        Line 3: The abbreviation IN has a dot of sorts over it rather than the
        usual bar.

        Line 4: GENNHQENTOI for GENNHQENTOS.

        Line 4: EI for H in BEIQLEEM.

        Line 5: GHS appears to be a visual error for the THS of the TR;
        since
        the reading makes good sense, it is easy to see how the scribe
        missed
        the mistake.

        Line 10: Uncertain whether the ms reads TEIXQEIS or THXQEIS.
        The parchment
        is damaged and stained at this spot, and the reading could go
        either
        way. Either one appears to be a simple spelling error.
        ------

        > Bob Waltz said
        >
        > > But I would argue that the first
        > > step is to decide what information we need in a collation.
        >
        > I recommend the inclusion of spelling variations. My research indicates
        > that MS spellings say as much as textual variations when it comes to
        > grouping MSS. A big problem is regularisation, whereby spelling variation
        > has to be standardised before collation. It seems to me that the only safe
        > way to do regularisation in a highly inflected language like Greek is to
        > specify the standard spelling of every single word in the MS. Global
        > replacement is not safe because one scribe's spelling variation can be
        > another scribe's textual variation.

        Agreed. If the goal is transcription rather than collation, then
        IMNSHO the ms. should be transcribed exactly as it appears.
        Ditto for collations, though I personally prefer to work with
        transcriptions rather than collations. That's a personal-level
        preference, though, and says nothing about the relative merits of
        the two.

        > Mike Bossingham said
        >
        > > What is really needed is not a common data base - these are
        > > transient and come and go. But a means of sending transciptions
        > > in plain ASCII around the net - so that data can be built into
        > > the different data bases and used.
        >
        > For what it's worth, I agree. All we need to agree upon is the text of the
        > manuscript. Mark-up can then be added by whoever wants to, for whatever
        > reason (see Dave Wasburn's comment below and the above idea). Of course,
        > we would want some quality control.

        As I recall, this was one of the original ideas of the ENTMP. Pass
        them around freely so anyone can use them, definitely. But I
        would push for a centralized location (or at least style and dtd) for
        markup, rather than each doing what is right in his own eyes...

        > Larry Hurtado said
        >
        > > Electronic data is ephemeral enough, and people come &
        > > go, lose interest, get sick, etc.; we need a good institutional base
        > > for the work.
        >
        > I heartily agree with the first clause, except to say that electronic data
        > need not be ephemeral. A plain ASCII transcription of a manuscript's text
        > need not be ephemeral at all -- letters are letters and you can either
        > read them or not. Even a group of scholars can agree on the text of a
        > manuscript when confronted with a good facsimile. I have some reservations
        > about the second clause. I would say that we need a good standard approach
        > as the basis for our work. Take the Internet as an example. It is built
        > upon standards (ISO, IEC, IEEE, etc). Many of these standards are pretty
        > much what the industry had already invented in order to get a job done.
        > Whether or not a committee is involved may be irrelevant. There was only
        > one Tim Berners-Lee needed to get WWW off the ground.

        Agreed. The whole idea of making these available on the Internet
        is to move away from the "institutional base" idea into a more freely-
        accessible format and realm (how many of us non-affiliated
        researchers have the wherewithal to travel to the Huntington library
        to look at their microfilms of the Dead Sea Scrolls? How many of
        said folks, myself included, have the cash to buy Brill's hideously-
        overpriced microfiche set?). If the transcriptions (with or without
        markup) are on, say, 100 people's computers and can be posted to
        the Web from any or all of them, I would hardly consider that
        "ephemeral."

        > Dave Washburn said
        >
        > > As soon as someone can tell me that I won't get sued for it, I can
        > > post these to a temporary web site. Since the transcriptions are
        > > mine, of course, I can post them with no problem, I refer to the
        > > images.
        >
        > I have always been worried about copyright of images. I don't know whether
        > this is paranoia or fear of getting manuscript custodians off side. How
        > about if we put the images up, then one of us takes on the job of writing
        > to the relevant custodian and asking for permission. That way we are doing
        > the right thing by them (asking permission) and getting the images up
        > straight away. They can only say no, at which point we could politely ask
        > them to put the image up for us or just have a notice saying that the
        > image cannot be displayed due to the custodian being a bad sport.

        Couched in more diplomatic language, of course ;-) What images I
        have were originally in the ENTMP "manuscript room," which
        required registration and login before permitting access. I suppose
        it would be easy enough to set up something like that again. I'm
        assuming that copyright restrictions were the reason for the login
        process, which may or may not be correct.

        > > Are those [images] the ones that were on the original ENTMP? I may have
        > > some of those around here...I had 0166, 071, I think P22 and
        > > perhaps one other, I don't remember which.
        >
        > They are some of the ones that were on the original ENTMP. I have a few
        > others as well. What I would really like to see is for the custodians to
        > put their treasures on the Web. Have you seen the Oxyrhynchus site? They
        > already have images of papyri there. (Go to http://www.csad.ox.ac.uk/)
        >
        > > I like Wieland's idea of going with ASCII (and I will push
        > > again for CCAT mapping) to begin with, that way anybody who
        > > wants to can take them and do the markup. If we do the bare
        > > transcriptions in a fairly unencumbered format, markup becomes a
        > > matter of retrofitting which should be easier.
        >
        > Yes, yes, yes. My only complaint about CCAT (and Beta code) is the
        > transcription of xi with C and chi with X. In Athens, taxi is spelt with
        > xi, not chi. Oh well.

        Agreed. However, any mapping scheme is going to have some
        less-than-ideal features. The main reason I suggested CCAT is
        because a) it's already in place and used profusely, and b) there
        are numerous fonts for just about any platform, floating around the
        Net, that are based on it, so all one has to do is download one,
        install it, and go.

        > Jim West said
        >
        > > I dont recall which mss I transcribed. The low P's I think.
        >
        > Jim, I still have your transcriptions of P1, P2, P3, P4 and P9.

        If it's all right with Jim, could you pass those along to me off-list? I
        could start putting some of these transcriptions up now if it's
        agreeable to enough folks.

        Dave Washburn
        http://www.nyx.net/~dwashbur
        A Bible that's falling apart means a life that isn't.
      • Robert B. Waltz
        ... Just a few thoughts.... The idea of HTML tags for nomina sacra, etc. have advantages. But I see a bit of a problem. HTML-style tags, obviously, use the
        Message 3 of 4 , Aug 2 10:21 AM
        • 0 Attachment
          On 8/2/99, Dave Washburn wrote, in part:

          >The transcriptions I did didn't have any real way to indicate
          >abbreviations (I assume you mean little ligatures like the trailing-off
          >line on kappa at the end of a line to indicate KAI, that sort of
          >thing?), but under James' direction I used brackets to indicate
          >uncertain or supplied letters, and I believe we were playing with a
          >pseudo-HTML tag <BAR></BAR> to indicate NS. I would tend to
          >prefer something like this that keeps the format of running text,
          >simply because later on it should make it easier to mark up (then
          >again, I may be wrong about that). Here's the preliminary
          >transcription of 071 that I did lo, those many ages ago:

          Just a few thoughts....

          The idea of HTML tags for nomina sacra, etc. have advantages. But
          I see a bit of a problem. HTML-style tags, obviously, use the angle
          brackets < >. But > is a symbol one sometimes encounters in a
          manuscript. Does one, then, use the correct HTML symbol > in the
          transcription? If so, we're not using ASCII any more. :-)

          I might be tempted instead to use something else like {} for these
          sorts of tags. If one then wishes to convert to HTML, it's a simple
          change -- just { to < and } to >. And it leaves us [] for lacunae
          and perhaps () for uncertain letters.

          I'm not trying to make life difficult here, just point out some things
          to think about.

          [ ... ]

          >Agreed. If the goal is transcription rather than collation, then
          >IMNSHO the ms. should be transcribed exactly as it appears.

          I think this should be the goal even with collations. :-) Personally,
          I prefer collations most of the time, because it's so much easier to
          notice the difference between, say, ECOMEN and ECWMEN. :-) But it
          depends on circumstances. I do think that the collation base needs
          to be posted on the site in its entirety; it's *not* enough to just
          list it.

          [ ... ]

          >As I recall, this was one of the original ideas of the ENTMP. Pass
          >them around freely so anyone can use them, definitely. But I
          >would push for a centralized location (or at least style and dtd) for
          >markup, rather than each doing what is right in his own eyes...

          I have to agree with this one. I've had to work with some really
          strange collation styles in my life (the award, I think, goes to
          Davies's collations of 330, 436, 462, and 2344. I *still* stumble
          over that one sometimes). Ditto the Collate format for transcriptions.
          My first few attempts to convert Tim Finney's transcriptions of
          Hebrews resulted in quite a few bollixes, especially as regards
          correctors. Using more than one standard is a shortcut to trouble.

          [ ... ]

          > > Yes, yes, yes. My only complaint about CCAT (and Beta code) is the
          > > transcription of xi with C and chi with X. In Athens, taxi is spelt with
          > > xi, not chi. Oh well.
          >
          >Agreed. However, any mapping scheme is going to have some
          >less-than-ideal features. The main reason I suggested CCAT is
          >because a) it's already in place and used profusely, and b) there
          >are numerous fonts for just about any platform, floating around the
          >Net, that are based on it, so all one has to do is download one,
          >install it, and go.

          Obviously the big hang-up here is X/C. (At least, every scheme I've
          ever seen renders theta as Q, and the rest are obvious.) Maybe it's
          actually time for a vote. :-)

          I vote C=chi and X=xi, for what it's worth.

          Bob Waltz
          waltzmn@...

          "The one thing we learn from history --
          is that no one ever learns from history."
        • Dave Washburn
          ... Good point. This is the sort of thing that needs to be agreed upon, though I would suggest that when we re down to the level of tweaking individual
          Message 4 of 4 , Aug 3 9:29 AM
          • 0 Attachment
            Bob Waltz wrote:
            > On 8/2/99, Dave Washburn wrote, in part:
            >
            > >The transcriptions I did didn't have any real way to indicate
            > >abbreviations (I assume you mean little ligatures like the trailing-off
            > >line on kappa at the end of a line to indicate KAI, that sort of
            > >thing?), but under James' direction I used brackets to indicate
            > >uncertain or supplied letters, and I believe we were playing with a
            > >pseudo-HTML tag <BAR></BAR> to indicate NS. I would tend to
            > >prefer something like this that keeps the format of running text,
            > >simply because later on it should make it easier to mark up (then
            > >again, I may be wrong about that). Here's the preliminary
            > >transcription of 071 that I did lo, those many ages ago:
            >
            > Just a few thoughts....
            >
            > The idea of HTML tags for nomina sacra, etc. have advantages. But
            > I see a bit of a problem. HTML-style tags, obviously, use the angle
            > brackets < >. But > is a symbol one sometimes encounters in a
            > manuscript. Does one, then, use the correct HTML symbol > in the
            > transcription? If so, we're not using ASCII any more. :-)
            >
            > I might be tempted instead to use something else like {} for these
            > sorts of tags. If one then wishes to convert to HTML, it's a simple
            > change -- just { to < and } to >. And it leaves us [] for lacunae
            > and perhaps () for uncertain letters.

            Good point. This is the sort of thing that needs to be agreed upon,
            though I would suggest that when we're down to the level of
            tweaking individual characters like this, we're pretty close to a
            standard. Since as I recall the > and < characters are used for
            accents, they wouldn't be a problem for the earlier mss, and such
            things could be adjusted during markup later. I could live with that.

            Of course, if this expands into Hebrew mss, then using { and }
            won't work because in CCAT these denote a couple of very
            common final-form letters...

            > I'm not trying to make life difficult here, just point out some things
            > to think about.

            And your points are well taken, at least by yours truly.

            > [ ... ]
            >
            > >Agreed. If the goal is transcription rather than collation, then
            > >IMNSHO the ms. should be transcribed exactly as it appears.
            >
            > I think this should be the goal even with collations. :-) Personally,
            > I prefer collations most of the time, because it's so much easier to
            > notice the difference between, say, ECOMEN and ECWMEN. :-) But it
            > depends on circumstances. I do think that the collation base needs
            > to be posted on the site in its entirety; it's *not* enough to just
            > list it.
            >
            > [ ... ]
            >
            > >As I recall, this was one of the original ideas of the ENTMP. Pass
            > >them around freely so anyone can use them, definitely. But I
            > >would push for a centralized location (or at least style and dtd) for
            > >markup, rather than each doing what is right in his own eyes...
            >
            > I have to agree with this one. I've had to work with some really
            > strange collation styles in my life (the award, I think, goes to
            > Davies's collations of 330, 436, 462, and 2344. I *still* stumble
            > over that one sometimes). Ditto the Collate format for transcriptions.
            > My first few attempts to convert Tim Finney's transcriptions of
            > Hebrews resulted in quite a few bollixes, especially as regards
            > correctors. Using more than one standard is a shortcut to trouble.

            I haven't tried these, but I wholeheartedly agree with the last
            sentence.

            > > > Yes, yes, yes. My only complaint about CCAT (and Beta code) is the
            > > > transcription of xi with C and chi with X. In Athens, taxi is spelt with
            > > > xi, not chi. Oh well.
            > >
            > >Agreed. However, any mapping scheme is going to have some
            > >less-than-ideal features. The main reason I suggested CCAT is
            > >because a) it's already in place and used profusely, and b) there
            > >are numerous fonts for just about any platform, floating around the
            > >Net, that are based on it, so all one has to do is download one,
            > >install it, and go.
            >
            > Obviously the big hang-up here is X/C. (At least, every scheme I've
            > ever seen renders theta as Q, and the rest are obvious.) Maybe it's
            > actually time for a vote. :-)
            >
            > I vote C=chi and X=xi, for what it's worth.

            The only problem there is getting folks like SP to make the
            necessary adjustments in their fonts etc. as well as changing all
            the transcriptions that are already out there, such as the stuff at
            the Perseus Project etc. Were it to come down to a vote, I would
            probably vote the same way. However, I can see mountains of
            impracticalities in trying to get the rest of the world to change it...

            Dave Washburn
            http://www.nyx.net/~dwashbur
            A Bible that's falling apart means a life that isn't.
          Your message has been successfully submitted and would be delivered to recipients shortly.