## Re: [Synoptic-L] Some questions to David

Expand Messages
• Some questions to David. 1] Why a substraction and not a division ? ... Why do you use the substraction, and not the division, for that operation ? Your
Message 1 of 5 , Dec 4, 2001
Some questions to David.

1] Why a substraction and not a division ?

> So now we can compute the frequencies by category, relative
> to what we would expect.
> For "the" in category "222" we get .02 - .025 = -.005.
> "the" occurs less frequently in "222" that it does in all categories.
> For "bottom" in category "222" we get .005 - .004 = .001

Why do you use the substraction, and not the division, for that operation ?
Your justificatoin, with the word "relative", induce logically a division.
You said elsewhere that you need to balance the high frequency of "the"
but you can do it only by a division, for instance :

For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25

This operation would give a better relative representation. Do you think not ?
For that time being, I do not understand the purpose of your substraction.

2] Imagine that hypothetical simple schema :

A => Mt
A + Mt => Lk
A + Mt + Lk => Mk

In 222, you may find :
- words of A that have been kept by Mt, by Lk and Mk.
- words of Mt that have been kept by Mk and Lk

In 122 you may find :
- words of A that have been changed by Mt, but kept by Lk, and Mk.
- words of Lk, kept by Mk.

Only for that simple schema, your basic vectors (122, 222, etc.)
may be constituted by composite redactional features. Particularly,
it may produce a combined correlation effect with different material
of gospels that separatly would have produced an anti-correlative
effect.

With other words, your data are not a complete overview of styles of
gospels, since it does not take into account location of occurences.
(exactly as I can not describe the demography of USA just by giving an
histogram of population according states). Your data are a snap shot
of gospel styles, where a single appearing syle profile may in fact
hide three of four very different redaction flows.

3] What is the method you choose for clustering ?
I guess K-mean, but may you confirm it ?

Whatever the case, since it is possible to produce a correlative
effect with a combination of anti-correlated pairs of vectors,
your clustering is more negative than positive : you may not
warrant that a cluster is a single style element, even if the
Particularly, your cluster A can not be considered as a natural
groups of close elements.

4] On the opposite, the anti-correlative phenomenon looks as
an unexpected effect. May you assess how significant it is ?
And is it possible to imagine a single writer producing a
pattern of anti-correlative elements ?

a+
manu

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
List Owner: Synoptic-L-Owner@...
• Hello Emmanuel, Comments in-line. ... not ? ... We ve all ready done one division to get to frequencies. I m not sure what impact your idea would have.
Message 2 of 5 , Dec 4, 2001
Hello Emmanuel,

>
> Some questions to David.
>
> 1] Why a subtraction and not a division ?
>
> > So now we can compute the frequencies by category, relative
> > to what we would expect.
> > For "the" in category "222" we get .02 - .025 = -.005.
> > "the" occurs less frequently in "222" that it does in all categories.
> > For "bottom" in category "222" we get .005 - .004 = .001
>
> Why do you use the subtraction, and not the division, for that operation ?
> Your justification, with the word "relative", induce logically a division.
> You said elsewhere that you need to balance the high frequency of "the"
> but you can do it only by a division, for instance :
>
> For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
> For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
>
> This operation would give a better relative representation. Do you think
not ?
> For that time being, I do not understand the purpose of your subtraction.
>

We've all ready done one division to get to frequencies. I'm not sure what
impact your idea would have. However, the best way (as I did with the SAS
tool) is to compute the individual frequencies, then treat them as
variables, and treat the overall frequency as a partial variable, and do
what is called a "partial-correlation". It takes into account the standard
deviations, as well as doing conceptually, about the same thing I did with
the subtraction.

>
> 2] Imagine that hypothetical simple schema :
>
> A => Mt
> A + Mt => Lk
> A + Mt + Lk => Mk
>
> In 222, you may find :
> - words of A that have been kept by Mt, by Lk and Mk.
> - words of Mt that have been kept by Mk and Lk
>
> In 122 you may find :
> - words of A that have been changed by Mt, but kept by Lk, and Mk.
> - words of Lk, kept by Mk.
>
> Only for that simple schema, your basic vectors (122, 222, etc.)
> may be constituted by composite redactional features. Particularly,
> it may produce a combined correlation effect with different material
> of gospels that separately would have produced an anti-correlative
> effect.

If I understand you correctly, I agree. I've discovered many of the results
are very difficult to untangle. (See the note I just posted.)

>
> With other words, your data are not a complete overview of styles of
> gospels, since it does not take into account location of occurrences.
> (exactly as I can not describe the demography of USA just by giving an
> histogram of population according states). Your data are a snap shot
> of gospel styles, where a single appearing style profile may in fact
> hide three of four very different redaction flows.

True, they are all ground up, only the frequency of the words is being
studied. Although, I'm comparing that to what I know about the structure.

>
>
> 3] What is the method you choose for clustering ?
> I guess K-mean, but may you confirm it ?

Average-linkage cluster analysis. It takes each member of each cluster and
compares in to each member of the other cluster to determine distance.

>
> Whatever the case, since it is possible to produce a correlative
> effect with a combination of anti-correlated pairs of vectors,
> your clustering is more negative than positive : you may not
> warrant that a cluster is a single style element, even if the
> Particularly, your cluster A can not be considered as a natural
> groups of close elements.

I'm not sure I got the first part. I agree cluster A is a bit surprising.
I think I now understand the 202-222 result as both being the result of
being kept by both Matthew and Luke. The 200-222 connection is not at a high
confidence level, and from a scatter plot, its hardly there at all. I think
the 202-200 connection may be the one real significant result of all this,
however.

>
> 4] On the opposite, the anti-correlative phenomenon looks as
> an unexpected effect. May you assess how significant it is ?
> And is it possible to imagine a single writer producing a
> pattern of anti-correlative elements ?

An example of what seems to happen:
22x and 12x have a negative correlation.
If Matthew dislikes a word, he will end up lowering the 22x frequency and
raising the 12x, tending to make the anti-correlate.

Thanks for the input,

Dave Gentile
Riverside, Illinois
M.S. Physics
PhD Management Science candidate

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
List Owner: Synoptic-L-Owner@...
• Hello again Emmanuel, I thought about your first suggestion here, a little more. ... ? ... division. ... subtraction. ... The current method gives more weight
Message 3 of 5 , Dec 4, 2001
Hello again Emmanuel,

> > 1] Why a subtraction and not a division ?
> >
> > > So now we can compute the frequencies by category, relative
> > > to what we would expect.
> > > For "the" in category "222" we get .02 - .025 = -.005.
> > > "the" occurs less frequently in "222" that it does in all categories.
> > > For "bottom" in category "222" we get .005 - .004 = .001
> >
> > Why do you use the subtraction, and not the division, for that operation
?
> > Your justification, with the word "relative", induce logically a
division.
> > You said elsewhere that you need to balance the high frequency of "the"
> > but you can do it only by a division, for instance :
> >
> > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
> > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
> >
> > This operation would give a better relative representation. Do you think
> not ?
> > For that time being, I do not understand the purpose of your
subtraction.
> >

The current method gives more weight to common words. Your system would give
each word an equal weight. I think that might lead to more noise, since low
frequency words might not be very well distributed.

Thanks again,

Dave Gentile
Riverside, Illinois
M.S. Physics
PhD Management Science candidate

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
List Owner: Synoptic-L-Owner@...
• ... OK. But in that case, what is the purpose of this substraction ? I still not understand it. a+ manu Synoptic-L Homepage:
Message 4 of 5 , Dec 5, 2001
> > > Why do you use the subtraction, and not the division [...] ?
> > >
> > > Your justification, with the word "relative", induce logically a
> > > division. You said elsewhere that you need to balance the high
> > > frequency of "the" but you can do it only by a division, for
> > > instance :
> > >
> > > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
> > > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
> > >
> > > This operation would give a better relative representation.
> > > Do you think not ?
> > > For that time being, I do not understand the purpose of your
> > > subtraction.
>
> The current method gives more weight to common words. Your system would give
> each word an equal weight. I think that might lead to more noise, since low
> frequency words might not be very well distributed.

OK. But in that case, what is the purpose of this substraction ?
I still not understand it.

a+
manu

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
List Owner: Synoptic-L-Owner@...
• Hello Emmanuel, ... Without the subtraction we d be asking: Do these documents have similar word frequencies? They do, because they are both samples of Greek
Message 5 of 5 , Dec 5, 2001
Hello Emmanuel,

>
> OK. But in that case, what is the purpose of this substraction ?
> I still not understand it.
>
> a+
> manu

Without the subtraction we'd be asking:
"Do these documents have similar word frequencies?"
They do, because they are both samples of Greek language.

With the subtraction, we are asking:
"Do these documents depart from the average Greek language frequency, in a
similar way?"

Does that help any?

Dave Gentile
Riverside, Illinois
M.S. Physics
PhD Management Science

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
List Owner: Synoptic-L-Owner@...
Your message has been successfully submitted and would be delivered to recipients shortly.