Loading ...
Sorry, an error occurred while loading the content.

tc-list What Collate can do

Expand Messages
  • Timothy John Finney
    Vincent Broman asked about Collate. I have used Dr Robinson s Collate program, so I can answer some of his questions. How well is another matter... Input of
    Message 1 of 3 , Dec 3, 1997
    • 0 Attachment
      Vincent Broman asked about Collate. I have used Dr Robinson's Collate
      program, so I can answer some of his questions. How well is another
      matter...

      Input of Texts?

      The program is SGML aware, but mostly in the output stages. I heard Peter
      Robinson say (facetiously) that he thought it was the most prolific SGML
      generator anywhere, although I suppose that the US military beats him on
      this point. I am not sure how well it translates SGML input. Robinson was
      an early convert to SGML and no one is keener than him to make a collation
      program that accepts SGML. It would be good to find out whether he has
      managed to do it and tell the list.

      Collations?

      I am not sure whether it accepts collations. You are not the only one who
      has asked this question.

      How many at a time?

      100 max.

      In what formats?

      You must specify what delimits textual divisions, what is punctuation, tag
      marker characters, gloss markers, milestone markers, and so on. The native
      format is that used in the Oxford Text Archive: e.g.,

      <ch 1><v 1> Yesterday, I went [ul]swimming[/ul]. <ch 2><v 2> The end.

      Things in <> are division markers (Robinson calls them blocks), Things in
      [] are tags, with[/] specifying the end (ul means underline here). As
      you can see, it is SGML-ish.

      The same formats as supported for output?

      Don't know.

      In what alphabets.

      Whatever you like, although if you want anyone else to be able to use
      your transcriptions you should stick to a convention. The program
      compares words (i.e. things between white space) division by division,
      and relies on the specification of some standard text as a collation
      base.

      With markup or only bare text?

      With mark-up, but it will regard the mark-up as part of the word. For
      example, [ul]blah[/ul] will NOT be equal to blah, but the 'fuzzy match'
      capability will recognise these as close to each other.

      Does the program filter or edit collations?

      The program works by taking a set of manuscript transcriptions then
      collating them against a reference text in the classical manner. It allows
      you to regularise spelling variations and to specify how you want a
      variation unit to look. In this sense, it allows you to filter and edit
      collations.

      Support a turing-complete scripting language?

      Sorry, I don't know what this means. I do not think the program supports
      scripting languages, although the last time I spoke to him, Dr Robinson
      was talking about changing to programming in Java rather than C.

      Run in batch mode? perform regularizations by rule?

      I don't think so. It performs regularisations individually on a local or
      global basis according to your specification (a very important capability
      -- global regularisation can be RISKY).

      Classify variants as significant/insignificant? organize variants with
      sub-variants, sub-sub-variants? etc.

      Don't know.

      All in all, my impression of Robinson's program is good. It has output
      formats for consequent use in database, cladistic analysis and
      multivariate analysis software. It can produce HTML output automagically.
      That means you can produce a web-oriented collation output as easily as a
      print-oriented one. It is SGML savvy, which is more than can be said for
      any other collation program (is TUSTEP? -- if there was an English
      operator's manual we would know). It has a long history in computer terms,
      is written by someone who cut his teeth on manuscripts, and is still being
      supported. I do not recommend writing your own collation program when you
      can get this one, unless you have a very good reason for doing so. (I'm
      speaking from experience here).

      Best regards,

      Tim Finney.
    • Vincent Broman
      ... The product blurbs suggest that Collate2-Project has some capability to input TEI, but that more general SGML was still in the blue-sky stage. I would
      Message 2 of 3 , Dec 4, 1997
      • 0 Attachment
        -----BEGIN PGP SIGNED MESSAGE-----

        > no one is keener than him to make a collation
        > program that accepts SGML. It would be good to find out whether he has

        The product blurbs suggest that Collate2-Project has some capability
        to input TEI, but that more general SGML was still in the blue-sky stage.
        I would conjecture that only a very limited subset of TEI is expected
        for this input option; full TEI would be almost as hard as full SGML.

        > I am not sure whether it accepts collations.

        The question is whether the basic function of the program is to
        start with a reference text and a set of 1-100 transcribed texts and
        output an apparatus of collations, or whether it is to start with an
        apparatus of a reference text with zero or more collations, and to add
        one more collation to it by processing one new transcribed text.
        The interactive approach described makes the second case seem most
        likely (assuming you are not required to keep your computer turned on
        {and never crash} until your set of collations is complete.)

        >>Support a turing-complete scripting language?
        >Sorry, I don't know what this means.

        A scripting language, or an extension language, is what allows the
        user to do things with the program that weren't foreseen in detail
        by the vendor (always my case). An extension language is Turing-complete
        if it has enough features to perform any computation that a Turing
        machine can (i.e. anything that any computer can), which means that
        you need functions, variable names, loops or recursion, if-tests, and some
        useful data type[s]. Writing in Java may or may not support user-extensions,
        depending on how you export the capabilities in the program.

        > I do not recommend writing your own collation program when you
        > can get this one,

        Buying and installing Windoze for my home computer, just to be able to
        eventually buy the non-cheap Collate2, which does some of the things
        I want to do, is not all that attractive to me. I'll still end up
        programming the other functions that I want. Besides, I'm not interested
        in time-wasting GUIs, let me bang on bare metal with Unix power tools.
        I'm still mulling over what my real needs are.


        Vincent Broman San Diego, California, USA
        Email: broman at sd.znet.com (home) or spawar.navy.mil or nosc.mil (work)
        Phone: +1 619 284 3775 Starship: 32d42m22s N 117d14m13s W
        === PGP protected mail preferred. For public key finger me at np.nosc.mil ===

        -----BEGIN PGP SIGNATURE-----
        Version: 2.6.2

        iQCVAwUBNIbrWGCU4mTNq7IdAQEdWAP/YHeIfOYnsT2eLhpw4b8dg48p+YXhA0X5
        +ER7BkneWlWe5sbDvFIc2GExoHbrGTAdlFevky+C7vFNUuM+oNH99lCweVU0z72X
        lq8XHf9d0ryNIGMh39BnFBKgxeJYZ+Ophzuj22zw357ePYvXCSViqmQjer/GJKxk
        tzhEwWWDhQs=
        =Drw8
        -----END PGP SIGNATURE-----
      • HPS.Bakker@nias.knaw.nl
        Message 3 of 3 , Dec 5, 1997
        • 0 Attachment
          A quick note regarding Collate:

          It is an excellent collation program for the Mac, which I have been using
          for several years. Until now I have been collating primarily Old Slavic and
          Greek NT MSS. It also handles Latin and Middle Dutch and provides good
          services in comparing the disparate Diatessaron witnesses. To my surprise
          and that of the developer of Collate, we even manage to collate Syriac and
          Arabic MSS.

          With a colleague from Germany I made a small comparison with TUSTEP: what
          took him three months te learn, took me about one hour with Collate. I
          think that says enough.

          Collate has a very handy feature called 'Regularisation', which enables you
          to handle interactively orthographic variation (extremely useful for
          dealing with versions). It will probably take quite a lot of time and
          energy to provide a similar implementation on other platforms.

          Computer collation in general enables you to work in a heuristic and
          transparent fashion. Cf. the review by Dr. Parker of my dissertation
          (http://shemesh.scholar.emory.edu/scripts/TC/vol02/Bakker1997rev.html), for
          which Collate served as an indispensible tool.

          Cheers! -- Michael


          Dr H.P.S. Bakker

          Netherlands Institute for Advanced Study (NIAS)
          Meijboomlaan 1
          2242 PR Wassenaar
          The Netherlands
          tel.: +31 70 512 2700
          fax: +31 70 511 7162

          Slavic Seminar
          University of Amsterdam
          Spuistraat 210
          1012 VT Amsterdam
        Your message has been successfully submitted and would be delivered to recipients shortly.