- --- In Synoptic@yahoogroups.com, "Dave Gentile" <gentile_dave@...>

wrote:>

A correction to the quick calculation - I had the spreadsheet set for

> O.K. - a back of the envelop calculation (or really some quick

> cutting and pasting with a spreadsheet) -

>

a 90th percentile confidence range, not 95th. I also needed to double

the number I gave, for another reason. As a result, there is more like

a 10% chance these numbers are just random chance (not 2.5% as

previously stated). Appologies for the error.

So the result seems significant at the 90th percentile, but just

barely. However, this (combined with Ron's other observations) still

suggests to me that sQ and xQ, by in large, are the result of two

different processes.

Dave Gentile

Riverside, IL - Dave Gentile wrote:

> O.K. - a back of the envelop calculation

Dave,

Thanks for your efforts, but you may need to find another envelope - should

be plenty around at this time of year :-)

> (or really some quick cutting and pasting with a spreadsheet) -

Or another spreadsheet.

> xQ:

Firstly, what I found was the set of strings common to Matthew and Luke

>

> 18 blocks

> 1770 words

> average length 98 words

> 1602 possible 10 word agrements

> .......

> sQ:

> 57 blocks

> 2381 words

> average length 42 words

> 1881 possible 10 word agrements

> 12 actual agreements

having *more than* ten contiguous words, i.e. 11+

Thus 1602 should be replaced by 1584 and 1881 by 1824.

Secondly you appear to be comparing apples and pears in the agreements. The

numbers 1584 and 1824 represent counts of the number of possible 11-word

strings (some of which will be overlapping). What I had counted were the

numbers and lengths of all the strings having more than ten words (none of

which overlap with each other by definition). The total number of words in

the xQ and sQ strings were 364 and 205 respectively. Therefore my actual

numbers of 11-word strings (some of which will overlap) are 364 - 10*23 =

134 and 205 - 10*12 = 85 respectively. So in xQ there are 134 contiguous

11-word strings out of a possible 1584, and in sQ there are 85 contiguous

11-word strings out of a possible 1824. (All this neglects the fact that the

blocks have different lengths, but I agree that the approximation that they

have equal lengths is unlikely to make much difference to the results.)

Ron Price

Derbyshire, UK

Web site: http://homepage.virgin.net/ron.price/index.htm >

23 actual agreement

> > xQ:

> >

> > 18 blocks

> > 1770 words

> > average length 98 words

> > 1602 possible 10 word agrements

> > .......

Ron:

> > sQ:

> > 57 blocks

> > 2381 words

> > average length 42 words

> > 1881 possible 10 word agrements

> > 12 actual agreements

>

Luke

> Firstly, what I found was the set of strings common to Matthew and

> having *more than* ten contiguous words, i.e. 11+

Dave:

> Thus 1602 should be replaced by 1584 and 1881 by 1824.

>

O.K. I'll change the calculation from 10+ to 11+. I'd expect this is

a small effect.

Ron:> Secondly you appear to be comparing apples and pears in the

agreements. The

> numbers 1584 and 1824 represent counts of the number of possible

11-word

> strings (some of which will be overlapping). What I had counted

were the

> numbers and lengths of all the strings having more than ten words

(none of

> which overlap with each other by definition).

Dave:

I had given that some thought. Counting that way seems to greatly

inflate the significance, and I don't think it is correct, although

granted I did not formulate a precise argument as to why it is

correct or not. Done the way you suggest, you get something like

99.999 percentile significance, which does not seem to be the right

order of magnitude for the numbers we're dealing with. Plus,

considering a few extreme cases leads to absurd looking conclusions.

So, without precise argument, I conclude we should not count that

way.

Rather, I would put it this way - there are 1824 places a string

could start, and 12 places one actually does start.

Then using the revised numbers, the finding is significant at the

89th percentile, just short of one typical arbitrary cut-off.

Regardless, it still adds something when combined with your other

arguments.

Here I should also note that I used a Bayesian credibility interval,

rather that a traditional confidence interval. They give nearly the

same result, although they say something subtly different. But in

this case if we are looking for that last 1%, the other method might

give results more to our liking, or it might be slightly worse.

Finally, one other potential problem - How was the "11+" criteria

selected? Was that the first number you tried, or did you try other

string length cutoffs first?

Dave Gentile

Riverside, IL- Dave Gentile wrote:

> Then using the revised numbers, the finding is significant at the

Dave,

> 89th percentile, just short of one typical arbitrary cut-off.

> Regardless, it still adds something when combined with your other

> arguments.

Thanks for carrying out this investigation.

> Finally, one other potential problem - How was the "11+" criteria

Good question. I first tried 18+ and realized there were so few strings that

> selected? Was that the first number you tried, or did you try other

> string length cutoffs first?

the result was going to be too sensitive to the choice of cut-off. I wanted

to choose a cut-off which was significantly lower than 18+, yet not so low

as to necessitate too much effort (my procedure being part computerized and

part manual). It also had to be not too near 14 as I had already observed an

apparently more-than-average number of strings of this length with known

assignment, and didn't want the result to be biased. I had also by this

stage determined to use a single computer run, for which (as it happens) an

odd number cut-off was more 'efficient'. Hence the 11+.

Ron Price

Derbyshire, UK

Web site: http://homepage.virgin.net/ron.price/index.htm