RE: [GTh] Visualizing Hyper-Synopsis with ClustalX2
This sounds interesting but I amnot sure if I have my mind around it.
What actually do you feed into the software? Is it the actual text in each
case or what you consider to be the text subunits? If it's the actual text,
then if you add Thomas into the mix, you have to use a translation to get
all the pieces of text you're comparing into the one language? If it's the
text subunits, then it doesn't matter much, as far as I can see, what
language your text is in, but it will only analyse the content or 'gist' not
other aspects such as which tense is used in which piece of text. Maybe I'm
just tired, but I'm not quite sure I understand and I'd like to.
Rev Judy Redman
Uniting Church Chaplain
University of New England
Armidale 2351 Australia
ph: +61 2 6773 3739
fax: +61 2 6773 3749
web: http://www.une.edu.au/chaplaincy/uniting/ and
> -----Original Message-----
> From: email@example.com
> [mailto:firstname.lastname@example.org] On Behalf Of Paul Lanier
> Sent: Sunday, 10 August 2008 6:29 AM
> To: email@example.com
> Subject: [GTh] Visualizing Hyper-Synopsis with ClustalX2
> Parallel relationships among the gospels, for a specific
> saying or account, have been difficult to visualize or to
> determine with complete objectivity. Schemes that arrange
> parallel texts in adjacent columns, as such the synopsis of
> Huck-Lietzman (1936), Throckmorton's Gospel Parallels (1989)
> or Mahlon Smith's Hyper-Synopsis
> (www.virtualreligion.net\primer\mustard.html), are extremely useful.
> These permit quick comparisons. However the relationships
> between the parallel texts must be deduced separately.
> Moreover the arguments used by to determine these
> relationships are not always completely objective.
> Conclusions about text relationships are affected to a great
> degree by selective emphasis on some criteria and
> minimization of others. This is necessarily so because a
> fully objective and comprehensive analytical method for
> determining parallel text relationships has not been developed.
> Here I propose the use of the bioinformatics program,
> ClustalX2, for determining and visualizing parallel gospel
> text relationships.
> ClustalX2 (http://www.clustal.org/) determines relationships
> among biological species by computing differences between DNA
> gene sequences. The results are visualized as an evolutionary
> (phylogenetic) tree by feeding the ClustalX2 output into
> NJPLOT, a tree-generating program (Perrière, G. and Gouy, M.
> 1996 WWW-Query: An on-line retrieval system for biological
> sequence banks. Biochimie, 78, 364-369). ClustalX2 and NJPLOT
> can be employed similarly with parallel gospel texts when
> each text is coded for the absence or presence of distinct
> textual subunits. This approach is described in greater
> detail in the group file, Visualizing Hyper-Synopsis with
> ClustalX2, where it is applied to gospel parallels for Logia
> 5, 20, 54 and 94. An example of the approach for GTh 20
> parallels is presented here.
> ClustalX generates the following tree for the parallel texts
> GTh 20 // Mk 4:30-32 // Lk 13:18-19 // Mt 13:31-32:
> _______ Mt 13:31-32
> | |__________ Lk 13:18-19
> | _____________ GTh 20
> |_____________ Mk 4:30-32
> The distances between branches measure their differences.
> Here Mt and Lk are close parallels. GTh and Mk are also close
> parallels. The tree can be interpreted in terms of common
> ancestors. In this interpretation, which is appropriate for
> the Four-Source Hypothesis, Mt and Lk share a common ancestor
> (such as Q), while GTh and Mk also share a common ancestor
> (such as proto-Thomas or proto-Mark). The tree can also be
> interpreted in terms of one text depending on its closest
> neighbor. For proponents of the Farrer Hypothesis, Lk borrows
> from Mt, while GTh borrows from Mk. The Mt/Lk branch borrows
> from the GTh/Mk branch.
> The tree is a visual representation of the following coded
> sequences for the parallel texts. Here a binary code
> represents the absence (coded 'a') or the text (coded 't') of
> a specific subunit. A list of all text subunits is followed
> by the coded sequences for each parallel text. The text
> source used here is the Scholar's Version.
> Mustard Seed (all text subunits)
> 1. two questions
> 2. mustard seed
> 3. smallest
> 4. falls
> 5. tossed
> 6. sown/sowed
> 7. ground
> 8. field
> 9. prepared soil
> 10. garden
> 11. comes up
> 12. grew/grows up
> 13. produces
> 14. large plant
> 15. biggest/largest of garden plants
> 16. became/becomes tree
> 17. branches
> 18. shelter
> 19. birds
> 20. of the sky
> 21. roost/roosted in its branches
> 22. nest in its shade
> 1 2
> GTh 20:1-4 atttaaaataaattaaatttaa
> Mk 4:30-32 tttaattaaatatatatattat
> Lk 13:18-19 ttaataaaatataaataattta
> Mt 13:31-32 attaatataaataattaattta
> The method appears generally useful for measuring and
> visualizing differences between parallel texts with complete
> objectivity. It can be argued that the analysis should be
> performed with Greek or Strong's Numbers rather than an
> English translation. In principle that seems a valid
> argument, one which points to a failure of textual criticism
> to provide an adequate comparison text (at least for some
> passages). It can also be argued that the method does not
> allow for comparisons of style (chreia, chiasm, etc). This is
> true at present, but this can be corrected by adding to each
> sequence terms for various styles.
> Finally, the method is in an early stage of development. I
> think it will be interesting to see how various Thomas
> parallel sayings are visualized, especially if recurring
> patterns emerge. Comments?
> regards, Paul Lanier
> Gospel of Thomas Homepage: http://home.epix.net/~miser17/Thomas.html
> Interlinear translation:
> Yahoo! Groups Links
- --- In firstname.lastname@example.org, "Judy Redman" <jredman@...> wrote:
> What actually do you feed into the software? Is it the actual textin each case or what you consider to be the text subunits?
Text subunits. For details of the complete process, see the tutorial
at the end of this post. Apologies for my feeble attempt to draw a
tree in the post. It displayed correctly when I previewed the post,
but the post deleted spaces. I am classifying it as 'Yahoo scribal
> if you add Thomas into the mix, you have to use a translation to getall the pieces of text you're comparing into the one language.
Yes. This requires a consistent translation. I don't see a way around
this problem, except maybe to translate Coptic GTh into the equivalent
NT Greek, and that raises perhaps more difficult issues. So far the
results seem reasonable, especially with PHYLIP unrooted trees. For
example, with GTh 41 parallel texts, the resulting unrooted tree is
similar to what is expected: GTh is far from the synoptics, and the
Mt/Mk branch is close to the Lk branch. Now a very interesting thing
occurs when a hypothetical "Terse Saying 41" ('Those who have nothing
will be deprived of nothing.') is added. Now the tree has Mt and Lk on
the same branch (consistent with Q), with the nearby Mk branch closer
to the TS41/GTh41 branch. All of this of course makes perfect sense:
in going from a proto-Gth to Lk, first GTh branches off, then Mk, then
Mt. Even the lengths of the branches suggests the time period
involved, and is consistent with what one might expect for oral tradition.
This really needs to be visualized. I will post a file on it in a day
> it will only analyse the content or 'gist' not other aspects such aswhich tense is used in which piece of text.
Yes. However the one performing the analysis chooses how to identify a
distinct text subunit. It could be a series of identical words, a
series of Strong's Numbers, or even a series of equivalent words. The
series can include other information, such as whether the text
contains a specified style. The only requirement is that each parallel
text be reduced to a coded binary sequence. There is great flexibility
here. I will be exploring the use of Strong's Numbers to see if the
method is valid for some non-Thomas synoptic parallels.
The tutorial folllows.
For clarity, here is an example of the complete process. This assumes
ClustalX2, NJPLOT and PHYLIP are installed on Windows XP.
Incidentally, ClustalX2, NJPLOT and PHYLIP are all free. They also run
as portable applications from a USB drive.
First, the distinct text subunits are identified and listed in order.
Then each parallel text is converted to a coded binary sequence where
'a' = absent, 't' = text. As an example, suppose we have these
parallel texts for the saying, 'Dog and fleas.'
A. My dog has fleas.
B. My dog has lots of fleas.
C. Uh oh, my dog has lots of fleas.
D. A dog has fleas but many dogs have no fleas.
The ordered list of distinctive text subunits is:
1. uh oh
6. lots of
8. but many dogs have no fleas
Next, coded binary sequences for each saying are created. Parallel 'A'
contains text subunits 2,4,6,7. So its coded binary sequence is
atatatta, where 1=absent('a'), 2=text('t'), 3=absent('a'),
Likewise parallel 'B' contains text subunits 2,4,5,6,7. So its binary
sequence is atatttta.
'C' contains text units 1,2,4,5,6,7. Its binary sequence is ttatttta.
Finally, 'D' = 3,4,5,7,8 = aatttatt.
So for these four parallel texts, the sequence alignment is:
Now, To construct a tree of relationships, the binary sequences are
treated as DNA gene sequences. It would be nice to use 1's and 0's
instead of a's and t's, but bioinformatics software (such as
ClustalX2) accept codes for DNA bases. These are 'a' (adenine), 'g'
(guanine), 'c' (cytosine) and 't' (thymidine).
Now binary sequences must be analyzed with ClustalX and a tree
displayed with NTPLOT or PHYLIP. To do this, the coded binary
sequences must be in the proper format. First, following file is
created in a text editor such as NotePad, then saved as
DogAndFleas.txt. (The file is between the lines of asterisks. The
asterisks are not in the file).
>A Dog and fleas1 atatatta
>B Dog and fleas1 atatttta
>C Dog and fleas1 ttatttta
>D Dog and fleas1 aatttatt
The format of the file is crucual. '>' begins the title of a sequence.
'1 ' begins the sequence.
To generate the ClustalX2 sequence alignment, double-click
clustalx.exe. When ClustalX2 opens, click File, then click Load
Sequences, then navigate to the DogAndFleas.txt file, then click the
Open button. This displays the sequence alignment:
Now the tree file is created. Click Trees, then click Draw Tree. In
the Draw Trees window, navigate to the folder where the tree file will
be saved. Note the file extension (.ph or .phy). Click the OK button.
Finally, we are ready to display the tree. To display a rooted tree,
open NJPLOT by double-clicking njplot.exe. Click File, then click
Open, then navigate to DogsAndFleas.ph, then click the Open button. A
rooted tree is displayed. To display the branch lengths, check the
Branch Lengths checkbox. The resulting tree image can be copied to
Word, WordPad, OpenOffice Text, or a graphics editor (PhotoShop, GIMP,
To display an unrooted tree, copy the DogsAndFleas.ph file to the
folder containing the PHYLIP drawtree.exe file. On my pc the path to
that folder is C:\phylip3.67\exe. Rename DogsAndFleas.ph as intree (no
file extension). To generate the tree, double-click drawtree.exe, then
enter 'Y' in the command line window. This displays an unrooted tree
in another window. You can modify how the tree is displayed tree and
also save it as a bitmap.
- --- In email@example.com, "Paul Lanier" <jpaullanier@...> wrote:
> Mk 4:30-32 // Lk 13:18-19 // Mt 13:31-32:
> _______ Mt 13:31-32
> | |__________ Lk 13:18-19
> | _____________ GTh 20
> |_____________ Mk 4:30-32
> The distances between branches measure their differences. Here
> Mt and Lk are close parallels. GTh and Mk are also close parallels.
> The tree can be interpreted in terms of common ancestors. In this
> interpretation, which is appropriate for the Four-Source Hypothesis,
> Mt and Lk share a common ancestor (such as Q), while GTh and Mk also
> share a common ancestor (such as proto-Thomas or proto-Mark). The
> tree can also be interpreted in terms of one text depending on its
> closest neighbor. For proponents of the Farrer Hypothesis, Lk
> borrows from Mt, while GTh borrows from Mk. The Mt/Lk branch
> borrows from the GTh/Mk branch.
I think this is an interesting way of visualizing similarities, but
how does this help in determining literary relationships? It seems
you are presupposing that these subunits were free-floating subunits
in oral tradition that gradually evolved as time passed. Mark,
Matthew and Luke were creative authors who composed a narrative and
therefore regularly adapted available materials for the goals of
their composition. We therefore need to take literary criticism and
narrative criticism into account in determining relationships.
Similarity alone doesn't seem to get us that far (unless the verbal
similarity is strong enough to necessitate borrowing and one can find
the redaction of the one in the other, but even then there are
disagreements as to what constitutes redaction, and even when there
is no such disagreement one often finds scholars discounting the
agreement on the basis of one of the weakest arguments out there:
scribal textual corruption).
Let's take your mustard seed example. I am a proponent of the Farrer
hypothesis, but I do not see why according to this hypothesis Thomas
necessarily borrows from Mark. I have a different view on this issue
while still holding to the Farrer hypothesis. Likewise, some
proponents of the two source (or four source) hypothesis (like
Christopher Tuckett) believe that Thomas is dependent on Mark for
this saying. Many of his fellow two source proponents disagree with
him on this, but the visual depiction of similarity will not be the
I hope you don't mind my regular critique of your recent posts. It is
just that I find your posts interesting and relevant for my own work
even though I seem to have quite a different view on these issues
than you do! :)
- --- In firstname.lastname@example.org, "rj.godijn" <rj.godijn@...> wrote:
> > I think this is an interesting way of visualizing similarities,but how does this help in determining literary relationships?
Thank you for your patience. I may never catch up!
In general, cluster analysis trees help one to visualize and explore
data relationships more easily. For example, the group file,
Lanier3.pdf, shows a very simple unrooted tree for an example parallel
text set, 'Dog and fleas.' This tree displays visually the distances
between texts. In cluster analysis, 'distance' is computed from an
input file that specifies the properties of the objects that will be
clustered. The tree is constructed such that objects with smaller
distances are closer together on the tree. If there is a relationship
between distance and some other property then that relationship can be
quantitated objectively and quickly. The technique is routinely used
to establish genetic relationships among biological species, and this
of course is related to inheritance (evolutionary descent). Thus
cluster analysis verifies and establishes biological evolutionary
In the example, 'Dog and fleas,' the texts are related according to
lines that run from D to A, then to B, then to C. Now if you look at
the individual texts, you could propose that Text A is the earliest,
that B and D are both derived from A, and that C is derived from B.
And there is a metric that supports that: the number of distinct text
subunits in A is lower than that of any other. Thus one might propose
that A, the simplest text by a completely objective method, is the
original. This can also be concluded independently by simply examining
the literary properties of the texts. So the two methods confirm each
Other solutions are possible but less likely. C does not have to be
derived from B; it could have been derived from D. But that is not the
simplest explanation (Occam's Razor). Obviously this method would
not be able to determine that, if that were actually the case. If a
literary analysis determined, for example, that D was earliest, then
the tree would suggest that A derives from D, B derives from A, and C
derives from B.
> We therefore need to take literary criticism and narrative criticisminto account in determining relationships.
Yes. I hope cluster analysis can supplement these. But we will have to
> I have a different view on this issue while still holding to theFarrer hypothesis. Likewise, some proponents of the two source (or
four source) hypothesis (like Christopher Tuckett) believe that Thomas
is dependent on Mark for this saying. Many of his fellow two source
proponents disagree with him on this, but the visual depiction of
similarity will not be the deciding factor.
I am guessing you are right, that a rootless tree can be interpreted
either way. But I will need to look at how several sets of text
parallels behave to see if this is true.
> you are presupposing that these subunits were free-floating subunitsin oral tradition that gradually evolved as time passed
I don't know about gradual! Some allow "a generation" for texts to
propagate, but I am thinking this is a ballpark, best-estimate
concept. Paul was driven from city to city fairly often, and he wrote
about it. Communications among Mediterranean ports could have been
frequent. And the Jewish War surely prompted drastic and immediate
changes in outlook and therefore sayings and texts. I think oral
sayings may evolve much quicker than written ones, but still one finds
obvious additions to the fourth century written text (John's Woman
Caught in Adultery, the ending of Mark). Text reframing, in general,
may be the norm rather than the exception. I don't think I am out on a
limb on this!
> I hope you don't mind my regular critique of your recent posts. Itis just that I find your posts interesting and relevant for my own
work even though I seem to have quite a different view on these issues
than you do!
Always welcome. Thank you, Richard, and please accept my apology for
the extreme delay in responding.