Demo of proposed method for recovering initial text
- Hi All,
Here is a demonstration of the method for recovering the initial text that I mentioned before. I am going to use an encoded version of the UBS4 apparatus of Mark's Gospel:
This is called a data matrix. It encodes the reading of each witness in the apparatus at every variation site. A 1 represents the first reading in the apparatus at a variation site, 2 the second, 3 the third, and so on. The first reading of the UBS apparatus is always that of the edited text. Consequently, the row of the data matrix corresponding to the UBS text is a string of ones. If a witness text is undefined at a variation site (e.g. due to a lacuna) then it is encoded as NA for not available.
I owe a great debt of gratitude to Richard Mallett who did the painstaking work of encoding the UBS4 apparatus for Mark.
The first step in analysis is to construct a distance matrix. This and subsequent analysis steps are performed by R scripts which I wrote for the purpose:
Here is the resulting distance matrix:
This is analysed using PAM analysis. See my Groups article for an explanation:
The MSW plot indicates that 3-, 6-, 11-, and 24-way partitions are not bad choices for this data set. PAM analysis chooses these medoids (i.e. group representatives) in a 3-way partition: Psi, Byz, it-i. Unfortunately, Psi lacks the first half of Mark so I will go to the next suggested number of groups (6). PAM now chooses these medoids: B, Byz, vg, it-ff-2, arm, 205. Applying the recovery method to these group representatives (i.e. taking the most frequent reading at each variation site) produces a text with quite a few NAs due to variation sites where there is no most frequent reading across group medoids. Quite often there is a 3 to 3 tie. To see the encoded medoids and the recovered text based on a 6-way partition ("Rec-6"), download this and open in a spreadsheet:
I chose the most frequent reading by inspection so it is quite possible that I've made mistakes. Please tell me if you find any.
An 11-way partition selects these medoids: B, Byz, Delta, it-ff-2, W, Theta, 205, cop-bo, vg, it-k, arm. The following table gives their encoded texts along with "Rec-11", the most frequent reading across 11 medoids:
This time there are not as many NAs due to fewer ties for the most frequent reading.
Analysing a data set which includes the texts recovered through 6- (Rec-6) and 11-way (Rec-11) partitions produces these CMDS and DC analysis results:
The CMDS map places the recovered texts somewhere between the major varieties, as might be expected. The DC dendrogram puts the recovered text in the same branch as Families 1 and 13, which I didn't expect.
The MSW plot indicates that a 10-way partition is a reasonable choice:
Using PAM to divide the data set into this many groups puts the recovered text in the same group as 700, 1342, it-aur, it-f, it-l, it-q, vg, arm, eth, geo, and Augustine. I didn't expect that, either!
B: UBS Aleph B Psi 2427
Byz: A f-13 33 157 180 579 597 1006 1010 1071 1241 1243 1292 1424 1505 Byz E F G H N Sigma Lect syr-p syr-h slav
Delta: C L Delta
it-ff-2: D it-a it-b it-c it-d it-ff-2 it-i it-r-1
Theta: Theta 565 syr-pal
205: f-1 28 205 syr-s
Rec-6: 700 1342 it-aur it-f it-l it-q vg arm eth geo Augustine Rec-6 Rec-11
cop-bo: 892 cop-sa cop-bo
Poorly classified (worst last):
it-q arm 892 eth geo 700 L 1342
Ranking witnesses by distance from Rec-11 produces this (* = distance is not statistically significant for this data set):
Rec-6 (0.028); vg (0.205); it-l (0.229); 205 (0.230); slav (0.235); f-1 (0.254); Lect (0.281); 1292 (0.283); Byz (0.288); 1243 (0.294); G (0.295); it-aur (0.295); 180 (0.302); 1006 (0.302); 597 (0.304); f-13 (0.310); 700 (0.310); E (0.310); 157 (0.314); 1424 (0.315); 1010 (0.317); 1241 (0.317); 1505 (0.320); H (0.322); geo (0.322); F (0.324); 1071 (0.325); Augustine (0.326); A (0.333); syr-h (0.337); arm (0.339); Sigma (0.348); it-q (0.348); syr-p (0.352); 28 (0.357); 33 (0.361); it-f (0.367*); cop-bo (0.367*); 565 (0.371); eth (0.373); 579 (0.379); Theta (0.382); 1342 (0.384*); N (0.400*); C (0.404*); syr-pal (0.404*); it-ff-2 (0.414*); 892 (0.416*); Delta (0.432*); L (0.433*); it-i (0.436*); cop-sa (0.436*); syr-s (0.437*); it-c (0.446*); 2427 (0.475*); W (0.476*); Psi (0.500*); UBS (0.504*); it-b (0.510*); it-k (0.537*); Aleph (0.542*); it-r-1 (0.556*); it-d (0.560); it-a (0.569); B (0.570); D (0.595)
I wasn't expecting the closest witness to the recovered text to be Jerome's Vulgate. Family 1 (which includes 205) is not much further away.