Loading ...
Sorry, an error occurred while loading the content.

Re: data modeling and manuscript-driven analysis

Expand Messages
  • yennifmit
    Hi Steven, [Your comments in quotes because all I get when I hit reply to your email in the Yahoo groups interface is an empty box!] Does your current model
    Message 1 of 11 , Apr 9 9:25 PM
      Hi Steven,

      [Your comments in quotes because all I get when I hit "reply" to your email in the Yahoo groups interface is an empty box!]

      "Does your current model allow for manuscript-driven analysis ?"

      Yes, if by that you mean "Can I focus analysis on a particular witness." There are various ways to do that:

      1. look for that witness in analysis results for a data set that includes the witness
      2. use the witness as a reference when calculating a distance matrix. (The R scripts I've written to construct distance matrices can be asked to keep a particular witness.)

      "In your data modeling, can you see the location of Alexandrinus, compared to either a Critical Text, or Byzantine, or Received Text data-point (I am taking 3 elements that can be taken as discrete, eg. using Stephanus 1550 for the TR or NA-27 for the CT or Robinson-Pierpont for the Byz) say book by book ? Or .. Gospels. Or "rest of NT". Looking only at the places where there are significant variant units."

      Yes. E.g., look for A in UBS-based data sets or 02 in INTF-based ones. (Beware: A = Ausgangstext (the ECM text) for INTF-Parallel data sets.)



      (I used dendrograms for the first two because Alex. is inside the Byzantine cloud -- with Family Pi -- and therefore hard to see in a CMDS map [= whirling cube].)

      Or, if you want to rank witnesses by distance from Alex., my "rank.r" script does that:

      > source("rank.r")
      Rank witnesses by distance from a reference.
      Asterisked distances are not statistically significant (alpha = 0.05).
      Distance matrix: ../dist/Acts-UBS2.P45.15.SMD.csv
      Counts list: ../dist/Acts-UBS2.P45.15.counts.csv
      Reference witness: A
      P74 (0.211); 81 (0.216); C (0.283); 33 (0.318); B (0.322); cop-bo (0.349); P45 (0.357*); Aleph-c (0.367*); 181 (0.386); vg (0.404); 1739 (0.405); 945 (0.421); Origen (0.429*); Lucifer (0.437*); cop-sa (0.447*); it-r (0.455*); arm (0.461*); it-ar (0.473*); geo (0.473*); 629 (0.476*); E (0.482*); 630 (0.489*); Psi (0.500*); syr-p (0.511*); it-e (0.531*); 326 (0.533*); 436 (0.548*); eth (0.549*); syr-h (0.559*); 88 (0.564*); Lect (0.570*); 614 (0.580); 1505 (0.580); it-l (0.583*); it-gig (0.584); 104 (0.587); 2412 (0.588); 1241 (0.590); 2492 (0.592); 1877 (0.598); 0142 (0.600); Byz (0.602); 2127 (0.603); 056 (0.606); 2495 (0.608); P (0.613); 451 (0.617); 049 (0.622); 330 (0.622); it-h (0.667*); it-d (0.680); Chrysostom (0.688); D (0.719); it-p (0.825)

      (The figures are simple matching distances. The smaller the distance, the closer the witness.)

      One can only get a distance if the thing you want to compare is in the data set. Many of the data sets have items corresponding to the ECM/NA/UBS text (e.g. ECM, A, UBS) or Majority text (e.g. Byz, Maj). Some have an item corresponding to the TR.

      "And then see a number like this, or a data point positioning on a graph:

      Alexandrinus - Acts - 90% agreement CT
      - 15% agreement TR
      - 8% agreement Byz

      Those are made up numbers, and are designed to simply remind us that occasionally there are unusual agreements like CT-TR vs. Byz, or CT-Byz vs.TR.

      (Theoretically the Clementine Vulgate would be extremely good to be in there too, even the Peshitta could be very helpful especially if we ever knew when it was first translated, and of course possibly individual manuscripts.)

      Is this concept in your current data modeling ? If not, is it on your radar ?"

      A distance matrix contains distances between all pairs of witnesses in the source data (unless a witness is dropped through lack of defined readings). Each distance can be transformed to a percentage agreement using this:

      agreement (%) = 100 * (1 - distance)

      The witnesses you are interested in need to be in the source data to end up in the corresponding distance matrix. Some data sets (e.g. Mark, UBS4) even include a row for the Clementine Vulgate (vg-cl)!

      By the way, there is structure within the mass of Byzantine texts.

      "Ultimately, it would also be interesting to be able to mold the search points. e.g. In my studies, there are maybe 200-250 "highly" significant omissions in the Critical Text when compared to Received Text or Byzantine Text "

      A data matrix can be sliced to choose a selection of variation units or of witnesses or of both. See the section titled "Slices of a Data Set" in my Groups article:


      A bias can be introduced by choosing variation units. Happily, the broad outline of the results seems persistent: it tends to survive no matter how you chop up the data.

      "So, taking a full NT, a search on say the 3000 most significant differences overall might, or might not, give comparable results to searches on the 250."

      One would only know by analysing both data sets.


      Tim Finney
    • yennifmit
      Hi Steven, [Your comments in quotes.] What tools can we use, if any, to actually see the position of all the uncials on a variant? One usually has to do a
      Message 2 of 11 , Apr 9 10:10 PM
        Hi Steven,

        [Your comments in quotes.]

        "What tools can we use, if any, to actually see the position of all the uncials on a variant?"

        One usually has to do a fair bit of digging. The INTF's New Testament Transcripts is a good place to start:


        There is also ITSEE for texts of John's Gospel:


        The data I have comes from a variety of sources, as outlined in the Sources section of my Views site:


        "Tim, what do you use to be sure you have all the uncials placed right?
        Or do you have to calculate and extrapolate as above ? (It sounds like you might use a computer collation, but it is not totally clear if there is any different than the written one, that lacks many of the Byzantine majority uncial listings and forces you to do some special checking and guessing)"

        I use a number of modes of multivariate analysis (PAM, CMDS, DC). Each mode operates on a distance matrix. Each mode places the witnesses in its own way. CMDS gives you the best possible representation (according to its stress function) of distances between witnesses with the specified number of dimensions. (I specify three.) All I do is apply established multivariate analysis techniques to distance matrices derived from New Testament textual data. The data sets are always samples of one kind or another. (Even if I had a complete set of all readings of all extant NT witnesses, it would still be a mere sample of what once existed.)

        "Is there a nice explanation, are 1, 2, 3, 4 variant units or the major controls?
        Can I know if Alexandrinus on Mark 1:1 is a specific variant ? Did I miss an Intro?"

        The introduction to the (draft) Views site is probably the best place to start:


        For more detail, see chapter two of my "Analysis of Textual Variation":


        But the thing I would really like people to read is my Groups article:



        Tim Finney
      Your message has been successfully submitted and would be delivered to recipients shortly.