Loading ...
Sorry, an error occurred while loading the content.
 

RE: [GTh] Visualizing Hyper-Synopsis with ClustalX2

Expand Messages
  • Judy Redman
    Paul, This sounds interesting but I amnot sure if I have my mind around it. What actually do you feed into the software? Is it the actual text in each case or
    Message 1 of 5 , Aug 12, 2008
      Paul,

      This sounds interesting but I amnot sure if I have my mind around it.

      What actually do you feed into the software? Is it the actual text in each
      case or what you consider to be the text subunits? If it's the actual text,
      then if you add Thomas into the mix, you have to use a translation to get
      all the pieces of text you're comparing into the one language? If it's the
      text subunits, then it doesn't matter much, as far as I can see, what
      language your text is in, but it will only analyse the content or 'gist' not
      other aspects such as which tense is used in which piece of text. Maybe I'm
      just tired, but I'm not quite sure I understand and I'd like to.

      Regards

      Judy

      --
      Rev Judy Redman
      Uniting Church Chaplain
      University of New England
      Armidale 2351 Australia
      ph: +61 2 6773 3739
      fax: +61 2 6773 3749
      web: http://www.une.edu.au/chaplaincy/uniting/ and
      http://blog.une.edu.au/unitingchaplaincy/
      email: jredman@...


      > -----Original Message-----
      > From: gthomas@yahoogroups.com
      > [mailto:gthomas@yahoogroups.com] On Behalf Of Paul Lanier
      > Sent: Sunday, 10 August 2008 6:29 AM
      > To: gthomas@yahoogroups.com
      > Subject: [GTh] Visualizing Hyper-Synopsis with ClustalX2
      >
      > Parallel relationships among the gospels, for a specific
      > saying or account, have been difficult to visualize or to
      > determine with complete objectivity. Schemes that arrange
      > parallel texts in adjacent columns, as such the synopsis of
      > Huck-Lietzman (1936), Throckmorton's Gospel Parallels (1989)
      > or Mahlon Smith's Hyper-Synopsis
      > (www.virtualreligion.net\primer\mustard.html), are extremely useful.
      > These permit quick comparisons. However the relationships
      > between the parallel texts must be deduced separately.
      > Moreover the arguments used by to determine these
      > relationships are not always completely objective.
      > Conclusions about text relationships are affected to a great
      > degree by selective emphasis on some criteria and
      > minimization of others. This is necessarily so because a
      > fully objective and comprehensive analytical method for
      > determining parallel text relationships has not been developed.
      >
      > Here I propose the use of the bioinformatics program,
      > ClustalX2, for determining and visualizing parallel gospel
      > text relationships.
      > ClustalX2 (http://www.clustal.org/) determines relationships
      > among biological species by computing differences between DNA
      > gene sequences. The results are visualized as an evolutionary
      > (phylogenetic) tree by feeding the ClustalX2 output into
      > NJPLOT, a tree-generating program (Perrière, G. and Gouy, M.
      > 1996 WWW-Query: An on-line retrieval system for biological
      > sequence banks. Biochimie, 78, 364-369). ClustalX2 and NJPLOT
      > can be employed similarly with parallel gospel texts when
      > each text is coded for the absence or presence of distinct
      > textual subunits. This approach is described in greater
      > detail in the group file, Visualizing Hyper-Synopsis with
      > ClustalX2, where it is applied to gospel parallels for Logia
      > 5, 20, 54 and 94. An example of the approach for GTh 20
      > parallels is presented here.
      >
      > ClustalX generates the following tree for the parallel texts
      > GTh 20 // Mk 4:30-32 // Lk 13:18-19 // Mt 13:31-32:
      >
      >
      > _______ Mt 13:31-32
      > __________|
      > | |__________ Lk 13:18-19
      > |
      > |
      > | _____________ GTh 20
      > |__|
      > |_____________ Mk 4:30-32
      >
      >
      > The distances between branches measure their differences.
      > Here Mt and Lk are close parallels. GTh and Mk are also close
      > parallels. The tree can be interpreted in terms of common
      > ancestors. In this interpretation, which is appropriate for
      > the Four-Source Hypothesis, Mt and Lk share a common ancestor
      > (such as Q), while GTh and Mk also share a common ancestor
      > (such as proto-Thomas or proto-Mark). The tree can also be
      > interpreted in terms of one text depending on its closest
      > neighbor. For proponents of the Farrer Hypothesis, Lk borrows
      > from Mt, while GTh borrows from Mk. The Mt/Lk branch borrows
      > from the GTh/Mk branch.
      >
      > The tree is a visual representation of the following coded
      > sequences for the parallel texts. Here a binary code
      > represents the absence (coded 'a') or the text (coded 't') of
      > a specific subunit. A list of all text subunits is followed
      > by the coded sequences for each parallel text. The text
      > source used here is the Scholar's Version.
      >
      > Mustard Seed (all text subunits)
      > 1. two questions
      > 2. mustard seed
      > 3. smallest
      > 4. falls
      > 5. tossed
      > 6. sown/sowed
      > 7. ground
      > 8. field
      > 9. prepared soil
      > 10. garden
      > 11. comes up
      > 12. grew/grows up
      > 13. produces
      > 14. large plant
      > 15. biggest/largest of garden plants
      > 16. became/becomes tree
      > 17. branches
      > 18. shelter
      > 19. birds
      > 20. of the sky
      > 21. roost/roosted in its branches
      > 22. nest in its shade
      >
      > 1 2
      > 1234567890123456789012
      >
      > GTh 20:1-4 atttaaaataaattaaatttaa
      > Mk 4:30-32 tttaattaaatatatatattat
      > Lk 13:18-19 ttaataaaatataaataattta
      > Mt 13:31-32 attaatataaataattaattta
      >
      > The method appears generally useful for measuring and
      > visualizing differences between parallel texts with complete
      > objectivity. It can be argued that the analysis should be
      > performed with Greek or Strong's Numbers rather than an
      > English translation. In principle that seems a valid
      > argument, one which points to a failure of textual criticism
      > to provide an adequate comparison text (at least for some
      > passages). It can also be argued that the method does not
      > allow for comparisons of style (chreia, chiasm, etc). This is
      > true at present, but this can be corrected by adding to each
      > sequence terms for various styles.
      > Finally, the method is in an early stage of development. I
      > think it will be interesting to see how various Thomas
      > parallel sayings are visualized, especially if recurring
      > patterns emerge. Comments?
      >
      > regards, Paul Lanier
      >
      >
      >
      > ------------------------------------
      >
      > Gospel of Thomas Homepage: http://home.epix.net/~miser17/Thomas.html
      > Interlinear translation:
      > http://www.geocities.com/mwgrondin/x_transl.htm
      >
      > ------------------------------------
      > Yahoo! Groups Links
      >
      >
      >
    • Paul Lanier
      ... in each case or what you consider to be the text subunits? Hi Judy, Text subunits. For details of the complete process, see the tutorial at the end of this
      Message 2 of 5 , Aug 12, 2008
        --- In gthomas@yahoogroups.com, "Judy Redman" <jredman@...> wrote:
        > What actually do you feed into the software? Is it the actual text
        in each case or what you consider to be the text subunits?

        Hi Judy,

        Text subunits. For details of the complete process, see the tutorial
        at the end of this post. Apologies for my feeble attempt to draw a
        tree in the post. It displayed correctly when I previewed the post,
        but the post deleted spaces. I am classifying it as 'Yahoo scribal
        error' !

        > if you add Thomas into the mix, you have to use a translation to get
        all the pieces of text you're comparing into the one language.

        Yes. This requires a consistent translation. I don't see a way around
        this problem, except maybe to translate Coptic GTh into the equivalent
        NT Greek, and that raises perhaps more difficult issues. So far the
        results seem reasonable, especially with PHYLIP unrooted trees. For
        example, with GTh 41 parallel texts, the resulting unrooted tree is
        similar to what is expected: GTh is far from the synoptics, and the
        Mt/Mk branch is close to the Lk branch. Now a very interesting thing
        occurs when a hypothetical "Terse Saying 41" ('Those who have nothing
        will be deprived of nothing.') is added. Now the tree has Mt and Lk on
        the same branch (consistent with Q), with the nearby Mk branch closer
        to the TS41/GTh41 branch. All of this of course makes perfect sense:
        in going from a proto-Gth to Lk, first GTh branches off, then Mk, then
        Mt. Even the lengths of the branches suggests the time period
        involved, and is consistent with what one might expect for oral tradition.

        This really needs to be visualized. I will post a file on it in a day
        or two.

        > it will only analyse the content or 'gist' not other aspects such as
        which tense is used in which piece of text.

        Yes. However the one performing the analysis chooses how to identify a
        distinct text subunit. It could be a series of identical words, a
        series of Strong's Numbers, or even a series of equivalent words. The
        series can include other information, such as whether the text
        contains a specified style. The only requirement is that each parallel
        text be reduced to a coded binary sequence. There is great flexibility
        here. I will be exploring the use of Strong's Numbers to see if the
        method is valid for some non-Thomas synoptic parallels.

        The tutorial folllows.

        regards, Paul


        CLUSTALX2/NJPLOT/PHYLIP TUTORIAL

        For clarity, here is an example of the complete process. This assumes
        ClustalX2, NJPLOT and PHYLIP are installed on Windows XP.
        Incidentally, ClustalX2, NJPLOT and PHYLIP are all free. They also run
        as portable applications from a USB drive.

        First, the distinct text subunits are identified and listed in order.
        Then each parallel text is converted to a coded binary sequence where
        'a' = absent, 't' = text. As an example, suppose we have these
        parallel texts for the saying, 'Dog and fleas.'

        A. My dog has fleas.
        B. My dog has lots of fleas.
        C. Uh oh, my dog has lots of fleas.
        D. A dog has fleas but many dogs have no fleas.

        The ordered list of distinctive text subunits is:
        1. uh oh
        2. my
        3. a
        4. dog
        5. has
        6. lots of
        7. fleas
        8. but many dogs have no fleas

        Next, coded binary sequences for each saying are created. Parallel 'A'
        contains text subunits 2,4,6,7. So its coded binary sequence is
        atatatta, where 1=absent('a'), 2=text('t'), 3=absent('a'),
        4=text('t'), etc.

        Likewise parallel 'B' contains text subunits 2,4,5,6,7. So its binary
        sequence is atatttta.
        'C' contains text units 1,2,4,5,6,7. Its binary sequence is ttatttta.
        Finally, 'D' = 3,4,5,7,8 = aatttatt.

        So for these four parallel texts, the sequence alignment is:

        A atatatta
        B atatttta
        C ttatttta
        D aatttatt

        Now, To construct a tree of relationships, the binary sequences are
        treated as DNA gene sequences. It would be nice to use 1's and 0's
        instead of a's and t's, but bioinformatics software (such as
        ClustalX2) accept codes for DNA bases. These are 'a' (adenine), 'g'
        (guanine), 'c' (cytosine) and 't' (thymidine).

        Now binary sequences must be analyzed with ClustalX and a tree
        displayed with NTPLOT or PHYLIP. To do this, the coded binary
        sequences must be in the proper format. First, following file is
        created in a text editor such as NotePad, then saved as
        DogAndFleas.txt. (The file is between the lines of asterisks. The
        asterisks are not in the file).
        *****************************************
        >A Dog and fleas
        1 atatatta

        >B Dog and fleas
        1 atatttta

        >C Dog and fleas
        1 ttatttta

        >D Dog and fleas
        1 aatttatt

        *****************************************

        The format of the file is crucual. '>' begins the title of a sequence.
        '1 ' begins the sequence.

        To generate the ClustalX2 sequence alignment, double-click
        clustalx.exe. When ClustalX2 opens, click File, then click Load
        Sequences, then navigate to the DogAndFleas.txt file, then click the
        Open button. This displays the sequence alignment:

        A atatatta
        B atatttta
        C ttatttta
        D aatttatt

        Now the tree file is created. Click Trees, then click Draw Tree. In
        the Draw Trees window, navigate to the folder where the tree file will
        be saved. Note the file extension (.ph or .phy). Click the OK button.

        Finally, we are ready to display the tree. To display a rooted tree,
        open NJPLOT by double-clicking njplot.exe. Click File, then click
        Open, then navigate to DogsAndFleas.ph, then click the Open button. A
        rooted tree is displayed. To display the branch lengths, check the
        Branch Lengths checkbox. The resulting tree image can be copied to
        Word, WordPad, OpenOffice Text, or a graphics editor (PhotoShop, GIMP,
        Fireworks, etc).

        To display an unrooted tree, copy the DogsAndFleas.ph file to the
        folder containing the PHYLIP drawtree.exe file. On my pc the path to
        that folder is C:\phylip3.67\exe. Rename DogsAndFleas.ph as intree (no
        file extension). To generate the tree, double-click drawtree.exe, then
        enter 'Y' in the command line window. This displays an unrooted tree
        in another window. You can modify how the tree is displayed tree and
        also save it as a bitmap.
      • rj.godijn
        ... Hi Paul, I think this is an interesting way of visualizing similarities, but how does this help in determining literary relationships? It seems you are
        Message 3 of 5 , Aug 13, 2008
          --- In gthomas@yahoogroups.com, "Paul Lanier" <jpaullanier@...> wrote:
          >
          > Mk 4:30-32 // Lk 13:18-19 // Mt 13:31-32:
          >
          >
          > _______ Mt 13:31-32
          > __________|
          > | |__________ Lk 13:18-19
          > |
          > |
          > | _____________ GTh 20
          > |__|
          > |_____________ Mk 4:30-32
          >
          >
          > The distances between branches measure their differences. Here
          > Mt and Lk are close parallels. GTh and Mk are also close parallels.
          > The tree can be interpreted in terms of common ancestors. In this
          > interpretation, which is appropriate for the Four-Source Hypothesis,
          > Mt and Lk share a common ancestor (such as Q), while GTh and Mk also
          > share a common ancestor (such as proto-Thomas or proto-Mark). The
          > tree can also be interpreted in terms of one text depending on its
          > closest neighbor. For proponents of the Farrer Hypothesis, Lk
          > borrows from Mt, while GTh borrows from Mk. The Mt/Lk branch
          > borrows from the GTh/Mk branch.


          Hi Paul,

          I think this is an interesting way of visualizing similarities, but
          how does this help in determining literary relationships? It seems
          you are presupposing that these subunits were free-floating subunits
          in oral tradition that gradually evolved as time passed. Mark,
          Matthew and Luke were creative authors who composed a narrative and
          therefore regularly adapted available materials for the goals of
          their composition. We therefore need to take literary criticism and
          narrative criticism into account in determining relationships.
          Similarity alone doesn't seem to get us that far (unless the verbal
          similarity is strong enough to necessitate borrowing and one can find
          the redaction of the one in the other, but even then there are
          disagreements as to what constitutes redaction, and even when there
          is no such disagreement one often finds scholars discounting the
          agreement on the basis of one of the weakest arguments out there:
          scribal textual corruption).

          Let's take your mustard seed example. I am a proponent of the Farrer
          hypothesis, but I do not see why according to this hypothesis Thomas
          necessarily borrows from Mark. I have a different view on this issue
          while still holding to the Farrer hypothesis. Likewise, some
          proponents of the two source (or four source) hypothesis (like
          Christopher Tuckett) believe that Thomas is dependent on Mark for
          this saying. Many of his fellow two source proponents disagree with
          him on this, but the visual depiction of similarity will not be the
          deciding factor.

          I hope you don't mind my regular critique of your recent posts. It is
          just that I find your posts interesting and relevant for my own work
          even though I seem to have quite a different view on these issues
          than you do! :)

          Regards, Richard
        • Paul Lanier
          ... but how does this help in determining literary relationships? Hi Richard, Thank you for your patience. I may never catch up! In general, cluster analysis
          Message 4 of 5 , Aug 20, 2008
            --- In gthomas@yahoogroups.com, "rj.godijn" <rj.godijn@...> wrote:

            > > I think this is an interesting way of visualizing similarities,
            but how does this help in determining literary relationships?

            Hi Richard,

            Thank you for your patience. I may never catch up!

            In general, cluster analysis trees help one to visualize and explore
            data relationships more easily. For example, the group file,
            Lanier3.pdf, shows a very simple unrooted tree for an example parallel
            text set, 'Dog and fleas.' This tree displays visually the distances
            between texts. In cluster analysis, 'distance' is computed from an
            input file that specifies the properties of the objects that will be
            clustered. The tree is constructed such that objects with smaller
            distances are closer together on the tree. If there is a relationship
            between distance and some other property then that relationship can be
            quantitated objectively and quickly. The technique is routinely used
            to establish genetic relationships among biological species, and this
            of course is related to inheritance (evolutionary descent). Thus
            cluster analysis verifies and establishes biological evolutionary
            relationships.

            In the example, 'Dog and fleas,' the texts are related according to
            lines that run from D to A, then to B, then to C. Now if you look at
            the individual texts, you could propose that Text A is the earliest,
            that B and D are both derived from A, and that C is derived from B.
            And there is a metric that supports that: the number of distinct text
            subunits in A is lower than that of any other. Thus one might propose
            that A, the simplest text by a completely objective method, is the
            original. This can also be concluded independently by simply examining
            the literary properties of the texts. So the two methods confirm each
            other.

            Other solutions are possible but less likely. C does not have to be
            derived from B; it could have been derived from D. But that is not the
            simplest explanation (Occam's Razor). Obviously this method would
            not be able to determine that, if that were actually the case. If a
            literary analysis determined, for example, that D was earliest, then
            the tree would suggest that A derives from D, B derives from A, and C
            derives from B.

            > We therefore need to take literary criticism and narrative criticism
            into account in determining relationships.

            Yes. I hope cluster analysis can supplement these. But we will have to
            ses!

            > I have a different view on this issue while still holding to the
            Farrer hypothesis. Likewise, some proponents of the two source (or
            four source) hypothesis (like Christopher Tuckett) believe that Thomas
            is dependent on Mark for this saying. Many of his fellow two source
            proponents disagree with him on this, but the visual depiction of
            similarity will not be the deciding factor.

            I am guessing you are right, that a rootless tree can be interpreted
            either way. But I will need to look at how several sets of text
            parallels behave to see if this is true.

            > you are presupposing that these subunits were free-floating subunits
            in oral tradition that gradually evolved as time passed

            I don't know about gradual! Some allow "a generation" for texts to
            propagate, but I am thinking this is a ballpark, best-estimate
            concept. Paul was driven from city to city fairly often, and he wrote
            about it. Communications among Mediterranean ports could have been
            frequent. And the Jewish War surely prompted drastic and immediate
            changes in outlook and therefore sayings and texts. I think oral
            sayings may evolve much quicker than written ones, but still one finds
            obvious additions to the fourth century written text (John's Woman
            Caught in Adultery, the ending of Mark). Text reframing, in general,
            may be the norm rather than the exception. I don't think I am out on a
            limb on this!

            > I hope you don't mind my regular critique of your recent posts. It
            is just that I find your posts interesting and relevant for my own
            work even though I seem to have quite a different view on these issues
            than you do!

            Always welcome. Thank you, Richard, and please accept my apology for
            the extreme delay in responding.

            regards,
            Paul Lanier
          Your message has been successfully submitted and would be delivered to recipients shortly.