Hello again Emmanuel,

> > 1] Why a subtraction and not a division ?
> >
> > > So now we can compute the frequencies by category, relative
> > > to what we would expect.
> > > For "the" in category "222" we get .02 - .025 = -.005.
> > > "the" occurs less frequently in "222" that it does in all categories.
> > > For "bottom" in category "222" we get .005 - .004 = .001
> >
> > Why do you use the subtraction, and not the division, for that operation
?
> > Your justification, with the word "relative", induce logically a
division.
> > You said elsewhere that you need to balance the high frequency of "the"
> > but you can do it only by a division, for instance :
> >
> > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
> > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
> >
> > This operation would give a better relative representation. Do you think
> not ?
> > For that time being, I do not understand the purpose of your
subtraction.
> >

The current method gives more weight to common words. Your system would give
each word an equal weight. I think that might lead to more noise, since low
frequency words might not be very well distributed.

Thanks again,

Dave Gentile
Riverside, Illinois
M.S. Physics
PhD Management Science candidate

OK. But in that case, what is the purpose of this substraction ?
I still not understand it.

a+
manu

• Hello Emmanuel, ... Without the subtraction we d be asking: Do these documents have similar word frequencies? They do, because they are both samples of Greek
Hello Emmanuel,

>
> OK. But in that case, what is the purpose of this substraction ?
> I still not understand it.
>
> a+
> manu

Without the subtraction we'd be asking:
"Do these documents have similar word frequencies?"
They do, because they are both samples of Greek language.

With the subtraction, we are asking:
"Do these documents depart from the average Greek language frequency, in a
similar way?"

Does that help any?

Dave Gentile
Riverside, Illinois
M.S. Physics
PhD Management Science

