- Hello Emmanuel,

Comments in-line.

>

not ?

> Some questions to David.

>

> 1] Why a subtraction and not a division ?

>

> > So now we can compute the frequencies by category, relative

> > to what we would expect.

> > For "the" in category "222" we get .02 - .025 = -.005.

> > "the" occurs less frequently in "222" that it does in all categories.

> > For "bottom" in category "222" we get .005 - .004 = .001

>

> Why do you use the subtraction, and not the division, for that operation ?

> Your justification, with the word "relative", induce logically a division.

> You said elsewhere that you need to balance the high frequency of "the"

> but you can do it only by a division, for instance :

>

> For "the" in category "222" we get (.02 - .025) / 0.025 = -.2

> For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25

>

> This operation would give a better relative representation. Do you think

> For that time being, I do not understand the purpose of your subtraction.

We've all ready done one division to get to frequencies. I'm not sure what

>

impact your idea would have. However, the best way (as I did with the SAS

tool) is to compute the individual frequencies, then treat them as

variables, and treat the overall frequency as a partial variable, and do

what is called a "partial-correlation". It takes into account the standard

deviations, as well as doing conceptually, about the same thing I did with

the subtraction.

>

If I understand you correctly, I agree. I've discovered many of the results

> 2] Imagine that hypothetical simple schema :

>

> A => Mt

> A + Mt => Lk

> A + Mt + Lk => Mk

>

> In 222, you may find :

> - words of A that have been kept by Mt, by Lk and Mk.

> - words of Mt that have been kept by Mk and Lk

>

> In 122 you may find :

> - words of A that have been changed by Mt, but kept by Lk, and Mk.

> - words of Lk, kept by Mk.

>

> Only for that simple schema, your basic vectors (122, 222, etc.)

> may be constituted by composite redactional features. Particularly,

> it may produce a combined correlation effect with different material

> of gospels that separately would have produced an anti-correlative

> effect.

are very difficult to untangle. (See the note I just posted.)

>

True, they are all ground up, only the frequency of the words is being

> With other words, your data are not a complete overview of styles of

> gospels, since it does not take into account location of occurrences.

> (exactly as I can not describe the demography of USA just by giving an

> histogram of population according states). Your data are a snap shot

> of gospel styles, where a single appearing style profile may in fact

> hide three of four very different redaction flows.

studied. Although, I'm comparing that to what I know about the structure.

>

Average-linkage cluster analysis. It takes each member of each cluster and

>

> 3] What is the method you choose for clustering ?

> I guess K-mean, but may you confirm it ?

compares in to each member of the other cluster to determine distance.

>

I'm not sure I got the first part. I agree cluster A is a bit surprising.

> Whatever the case, since it is possible to produce a correlative

> effect with a combination of anti-correlated pairs of vectors,

> your clustering is more negative than positive : you may not

> warrant that a cluster is a single style element, even if the

> Particularly, your cluster A can not be considered as a natural

> groups of close elements.

I think I now understand the 202-222 result as both being the result of

being kept by both Matthew and Luke. The 200-222 connection is not at a high

confidence level, and from a scatter plot, its hardly there at all. I think

the 202-200 connection may be the one real significant result of all this,

however.

>

See my previous note about this.

> 4] On the opposite, the anti-correlative phenomenon looks as

> an unexpected effect. May you assess how significant it is ?

> And is it possible to imagine a single writer producing a

> pattern of anti-correlative elements ?

An example of what seems to happen:

22x and 12x have a negative correlation.

If Matthew dislikes a word, he will end up lowering the 22x frequency and

raising the 12x, tending to make the anti-correlate.

Thanks for the input,

Dave Gentile

Riverside, Illinois

M.S. Physics

PhD Management Science candidate

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l

List Owner: Synoptic-L-Owner@... - Hello again Emmanuel,

I thought about your first suggestion here, a little more.

> > 1] Why a subtraction and not a division ?

?

> >

> > > So now we can compute the frequencies by category, relative

> > > to what we would expect.

> > > For "the" in category "222" we get .02 - .025 = -.005.

> > > "the" occurs less frequently in "222" that it does in all categories.

> > > For "bottom" in category "222" we get .005 - .004 = .001

> >

> > Why do you use the subtraction, and not the division, for that operation

> > Your justification, with the word "relative", induce logically a

division.

> > You said elsewhere that you need to balance the high frequency of "the"

subtraction.

> > but you can do it only by a division, for instance :

> >

> > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2

> > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25

> >

> > This operation would give a better relative representation. Do you think

> not ?

> > For that time being, I do not understand the purpose of your

> >

The current method gives more weight to common words. Your system would give

each word an equal weight. I think that might lead to more noise, since low

frequency words might not be very well distributed.

Thanks again,

Dave Gentile

Riverside, Illinois

M.S. Physics

PhD Management Science candidate

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l

List Owner: Synoptic-L-Owner@... > > > Why do you use the subtraction, and not the division [...] ?

OK. But in that case, what is the purpose of this substraction ?

> > >

> > > Your justification, with the word "relative", induce logically a

> > > division. You said elsewhere that you need to balance the high

> > > frequency of "the" but you can do it only by a division, for

> > > instance :

> > >

> > > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2

> > > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25

> > >

> > > This operation would give a better relative representation.

> > > Do you think not ?

> > > For that time being, I do not understand the purpose of your

> > > subtraction.

>

> The current method gives more weight to common words. Your system would give

> each word an equal weight. I think that might lead to more noise, since low

> frequency words might not be very well distributed.

I still not understand it.

a+

manu

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l

List Owner: Synoptic-L-Owner@...- Hello Emmanuel,

>

Without the subtraction we'd be asking:

> OK. But in that case, what is the purpose of this substraction ?

> I still not understand it.

>

> a+

> manu

"Do these documents have similar word frequencies?"

They do, because they are both samples of Greek language.

With the subtraction, we are asking:

"Do these documents depart from the average Greek language frequency, in a

similar way?"

Does that help any?

Dave Gentile

Riverside, Illinois

M.S. Physics

PhD Management Science

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l

List Owner: Synoptic-L-Owner@...