## Re: [Synoptic-L] Some questions to David

Expand Messages
• Hello again Emmanuel, I thought about your first suggestion here, a little more. ... ? ... division. ... subtraction. ... The current method gives more weight
Message 1 of 5 , Dec 4, 2001
• 0 Attachment
Hello again Emmanuel,

> > 1] Why a subtraction and not a division ?
> >
> > > So now we can compute the frequencies by category, relative
> > > to what we would expect.
> > > For "the" in category "222" we get .02 - .025 = -.005.
> > > "the" occurs less frequently in "222" that it does in all categories.
> > > For "bottom" in category "222" we get .005 - .004 = .001
> >
> > Why do you use the subtraction, and not the division, for that operation
?
> > Your justification, with the word "relative", induce logically a
division.
> > You said elsewhere that you need to balance the high frequency of "the"
> > but you can do it only by a division, for instance :
> >
> > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
> > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
> >
> > This operation would give a better relative representation. Do you think
> not ?
> > For that time being, I do not understand the purpose of your
subtraction.
> >

The current method gives more weight to common words. Your system would give
each word an equal weight. I think that might lead to more noise, since low
frequency words might not be very well distributed.

Thanks again,

Dave Gentile
Riverside, Illinois
M.S. Physics
PhD Management Science candidate

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
List Owner: Synoptic-L-Owner@...
• ... OK. But in that case, what is the purpose of this substraction ? I still not understand it. a+ manu Synoptic-L Homepage:
Message 2 of 5 , Dec 5, 2001
• 0 Attachment
> > > Why do you use the subtraction, and not the division [...] ?
> > >
> > > Your justification, with the word "relative", induce logically a
> > > division. You said elsewhere that you need to balance the high
> > > frequency of "the" but you can do it only by a division, for
> > > instance :
> > >
> > > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
> > > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
> > >
> > > This operation would give a better relative representation.
> > > Do you think not ?
> > > For that time being, I do not understand the purpose of your
> > > subtraction.
>
> The current method gives more weight to common words. Your system would give
> each word an equal weight. I think that might lead to more noise, since low
> frequency words might not be very well distributed.

OK. But in that case, what is the purpose of this substraction ?
I still not understand it.

a+
manu

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
List Owner: Synoptic-L-Owner@...
• Hello Emmanuel, ... Without the subtraction we d be asking: Do these documents have similar word frequencies? They do, because they are both samples of Greek
Message 3 of 5 , Dec 5, 2001
• 0 Attachment
Hello Emmanuel,

>
> OK. But in that case, what is the purpose of this substraction ?
> I still not understand it.
>
> a+
> manu

Without the subtraction we'd be asking:
"Do these documents have similar word frequencies?"
They do, because they are both samples of Greek language.

With the subtraction, we are asking:
"Do these documents depart from the average Greek language frequency, in a
similar way?"

Does that help any?

Dave Gentile
Riverside, Illinois
M.S. Physics
PhD Management Science

Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
List Owner: Synoptic-L-Owner@...
Your message has been successfully submitted and would be delivered to recipients shortly.