[Synoptic-L] principle component analysis
- Dave Gentile wrote (some time ago) --
>This might be of some interest. It will probably been unclear to some,
>but hopefully somewhat interesting anyway. It's another statistical
>technique called "principle component analysis". The first principle
>component is chosen to account for as much variation in the data as
>possible. The second is then chosen to account for as much of what is
>left as possible. Etc. Here are the first 5, based on the current data:
> Prin1 Prin2 Prin3 Prin4 Prin5
>F222 0.265535 0.079434 -.169146 -.272275 -.453310
>F211 0.249461 -.191359 0.462959 -.259347 0.001984
>F112 0.254064 -.149691 0.126356 0.587037 0.131907
>F221 0.257069 0.201322 0.302105 -.211982 -.006538
>F122 0.249635 0.268697 0.205017 0.404658 -.356178
>F121 0.258851 0.372056 0.110780 0.030916 0.314782
>F220 0.253783 0.323832 -.247425 -.162903 -.407663
>F120 0.256841 0.371185 -.105333 -.100304 0.329760
>F210 0.253932 -.228439 0.444596 -.074114 -.073266
>F020 0.253991 0.186352 -.294810 -.020725 0.402324
>F202 0.251105 -.406807 -.281229 -.150218 -.025970
>F201 0.258880 -.292340 -.080188 -.192987 0.303941
>F102 0.249604 -.225670 -.385551 0.291666 -.122142
>F002 0.279702 -.087882 -.008328 0.301434 -.029702
>#1 is weighted all equally, this is easy to interpret. It figured out
>that the biggest source of variation is that some words are just more
>common, in general. In short, they are all written in the Greek
>#2 separates Mark from non-Mark. Everything with a 2 in the middle is
>positive, the rest is negative.
>202 and 121 are the poles.
>#3 seems to want to separate Matthew from Q/proto-Mt. 211 and 210 are
>the big positives, 102 is the big negative.
>#4 clearly picks Luke out from everything else.
>#5 seems to separate 020 from 222.
>The first component accounts for 75% of the variation, the first five
>account for 95% of the variation.
The above assumes your approach to the HHBC correlations which is
that the "same words" indicate one author of the source material being
used, and "different words" indicate different authors of sources. There
is no need to make this assumption, however. If we assume, instead, that
the "same words" indicate one synoptist having redacted the source
material concerned, and "different words" as indicating more than one
synoptist having redacted, then a very different interpretation is
easily obtained --
#1 is weighted all equally because all three synoptists are redacting
basically the same source material.
#2 does indeed separate Mark from non-Mark, because Mark was the
synoptist who most faithfully retained the wording of the source
material he redacted. He redacted significantly, as shown by these
results, but not as heavily as either of the others.
#3 picks out that Matthew redacts more heavily than Mark (211, 221, 201,
and also to some extent in the positive minor agreements, 121) and that
Luke redacts about as heavily as Matthew in the narrative material in
the triple tradition (112, 122).
#4 clearly shows that Luke redacts his source material more heavily than
Matthew, and even more heavily than Mark.
#5 indicates that even though Mark redacts less heavily that Matthew or
Luke, yet he is a free spirit and has redacted to some extent, imposing
his style in a limited way on his source material.
I would suggest that the results of the principle component analysis you
give above, therefore can very neatly be accounted for by assuming that
the "same words" indicate one synoptist has redacted, and "different
words" indicate that different synoptists have redacted.
>HOMEPAGE http://www.twonh.demon.co.uk/Rev B.E.Wilson,10 York Close,Godmanchester,Huntingdon,Cambs,PE29 2EB,UK
> "What can be said at all can be said clearly; and whereof one cannot_
> speak thereof one must be silent." Ludwig Wittgenstein, "Tractatus".
Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
List Owner: Synoptic-L-Owner@...