>

> > xQ:

> >

> > 18 blocks

> > 1770 words

> > average length 98 words

> > 1602 possible 10 word agrements

23 actual agreement

> > .......

> > sQ:

> > 57 blocks

> > 2381 words

> > average length 42 words

> > 1881 possible 10 word agrements

> > 12 actual agreements

Ron:

>

> Firstly, what I found was the set of strings common to Matthew and

Luke

> having *more than* ten contiguous words, i.e. 11+

> Thus 1602 should be replaced by 1584 and 1881 by 1824.

>

Dave:

O.K. I'll change the calculation from 10+ to 11+. I'd expect this is

a small effect.

Ron:

> Secondly you appear to be comparing apples and pears in the

agreements. The

> numbers 1584 and 1824 represent counts of the number of possible

11-word

> strings (some of which will be overlapping). What I had counted

were the

> numbers and lengths of all the strings having more than ten words

(none of

> which overlap with each other by definition).

Dave:

I had given that some thought. Counting that way seems to greatly

inflate the significance, and I don't think it is correct, although

granted I did not formulate a precise argument as to why it is

correct or not. Done the way you suggest, you get something like

99.999 percentile significance, which does not seem to be the right

order of magnitude for the numbers we're dealing with. Plus,

considering a few extreme cases leads to absurd looking conclusions.

So, without precise argument, I conclude we should not count that

way.

Rather, I would put it this way - there are 1824 places a string

could start, and 12 places one actually does start.

Then using the revised numbers, the finding is significant at the

89th percentile, just short of one typical arbitrary cut-off.

Regardless, it still adds something when combined with your other

arguments.

Here I should also note that I used a Bayesian credibility interval,

rather that a traditional confidence interval. They give nearly the

same result, although they say something subtly different. But in

this case if we are looking for that last 1%, the other method might

give results more to our liking, or it might be slightly worse.

Finally, one other potential problem - How was the "11+" criteria

selected? Was that the first number you tried, or did you try other

string length cutoffs first?

Dave Gentile

Riverside, IL