## Re: Testing the 3ST

Expand Messages
• ... A correction to the quick calculation - I had the spreadsheet set for a 90th percentile confidence range, not 95th. I also needed to double the number I
Message 1 of 24 , Dec 15, 2007
--- In Synoptic@yahoogroups.com, "Dave Gentile" <gentile_dave@...>
wrote:
>
> O.K. - a back of the envelop calculation (or really some quick
> cutting and pasting with a spreadsheet) -
>

A correction to the quick calculation - I had the spreadsheet set for
a 90th percentile confidence range, not 95th. I also needed to double
the number I gave, for another reason. As a result, there is more like
a 10% chance these numbers are just random chance (not 2.5% as
previously stated). Appologies for the error.

So the result seems significant at the 90th percentile, but just
barely. However, this (combined with Ron's other observations) still
suggests to me that sQ and xQ, by in large, are the result of two
different processes.

Dave Gentile
Riverside, IL
• ... Dave, Thanks for your efforts, but you may need to find another envelope - should be plenty around at this time of year :-) ... Or another spreadsheet.
Message 2 of 24 , Dec 16, 2007
Dave Gentile wrote:

> O.K. - a back of the envelop calculation

Dave,

Thanks for your efforts, but you may need to find another envelope - should
be plenty around at this time of year :-)

> (or really some quick cutting and pasting with a spreadsheet) -

> xQ:
>
> 18 blocks
> 1770 words
> average length 98 words
> 1602 possible 10 word agrements
> .......
> sQ:
> 57 blocks
> 2381 words
> average length 42 words
> 1881 possible 10 word agrements
> 12 actual agreements

Firstly, what I found was the set of strings common to Matthew and Luke
having *more than* ten contiguous words, i.e. 11+
Thus 1602 should be replaced by 1584 and 1881 by 1824.

Secondly you appear to be comparing apples and pears in the agreements. The
numbers 1584 and 1824 represent counts of the number of possible 11-word
strings (some of which will be overlapping). What I had counted were the
numbers and lengths of all the strings having more than ten words (none of
which overlap with each other by definition). The total number of words in
the xQ and sQ strings were 364 and 205 respectively. Therefore my actual
numbers of 11-word strings (some of which will overlap) are 364 - 10*23 =
134 and 205 - 10*12 = 85 respectively. So in xQ there are 134 contiguous
11-word strings out of a possible 1584, and in sQ there are 85 contiguous
11-word strings out of a possible 1824. (All this neglects the fact that the
blocks have different lengths, but I agree that the approximation that they
have equal lengths is unlikely to make much difference to the results.)

Ron Price

Derbyshire, UK

Web site: http://homepage.virgin.net/ron.price/index.htm
• ... 23 actual agreement ... Luke ... Dave: O.K. I ll change the calculation from 10+ to 11+. I d expect this is a small effect. ... agreements. The ... 11-word
Message 3 of 24 , Dec 17, 2007
>
> > xQ:
> >
> > 18 blocks
> > 1770 words
> > average length 98 words
> > 1602 possible 10 word agrements
23 actual agreement

> > .......
> > sQ:
> > 57 blocks
> > 2381 words
> > average length 42 words
> > 1881 possible 10 word agrements
> > 12 actual agreements

Ron:
>
> Firstly, what I found was the set of strings common to Matthew and
Luke
> having *more than* ten contiguous words, i.e. 11+
> Thus 1602 should be replaced by 1584 and 1881 by 1824.
>

Dave:
O.K. I'll change the calculation from 10+ to 11+. I'd expect this is
a small effect.

Ron:
> Secondly you appear to be comparing apples and pears in the
agreements. The
> numbers 1584 and 1824 represent counts of the number of possible
11-word
> strings (some of which will be overlapping). What I had counted
were the
> numbers and lengths of all the strings having more than ten words
(none of
> which overlap with each other by definition).

Dave:
I had given that some thought. Counting that way seems to greatly
inflate the significance, and I don't think it is correct, although
granted I did not formulate a precise argument as to why it is
correct or not. Done the way you suggest, you get something like
99.999 percentile significance, which does not seem to be the right
order of magnitude for the numbers we're dealing with. Plus,
considering a few extreme cases leads to absurd looking conclusions.
So, without precise argument, I conclude we should not count that
way.

Rather, I would put it this way - there are 1824 places a string
could start, and 12 places one actually does start.

Then using the revised numbers, the finding is significant at the
89th percentile, just short of one typical arbitrary cut-off.
arguments.

Here I should also note that I used a Bayesian credibility interval,
rather that a traditional confidence interval. They give nearly the
same result, although they say something subtly different. But in
this case if we are looking for that last 1%, the other method might
give results more to our liking, or it might be slightly worse.

Finally, one other potential problem - How was the "11+" criteria
selected? Was that the first number you tried, or did you try other
string length cutoffs first?

Dave Gentile
Riverside, IL
• ... Dave, Thanks for carrying out this investigation. ... Good question. I first tried 18+ and realized there were so few strings that the result was going to
Message 4 of 24 , Dec 18, 2007
Dave Gentile wrote:

> Then using the revised numbers, the finding is significant at the
> 89th percentile, just short of one typical arbitrary cut-off.
> arguments.

Dave,

Thanks for carrying out this investigation.

> Finally, one other potential problem - How was the "11+" criteria
> selected? Was that the first number you tried, or did you try other
> string length cutoffs first?

Good question. I first tried 18+ and realized there were so few strings that
the result was going to be too sensitive to the choice of cut-off. I wanted
to choose a cut-off which was significantly lower than 18+, yet not so low
as to necessitate too much effort (my procedure being part computerized and
part manual). It also had to be not too near 14 as I had already observed an
apparently more-than-average number of strings of this length with known
assignment, and didn't want the result to be biased. I had also by this
stage determined to use a single computer run, for which (as it happens) an
odd number cut-off was more 'efficient'. Hence the 11+.

Ron Price

Derbyshire, UK

Web site: http://homepage.virgin.net/ron.price/index.htm
Your message has been successfully submitted and would be delivered to recipients shortly.