Browse Groups

• Re: [LingPipe] Classifying scores [i.e. scalar classification]

(4)
• NextPrevious
• ... We ve had requests to do other scalar classifiers, like reading level classification (on a grade-in-school scale). This is a general problem in statistical
Message 1 of 4 , Apr 2, 2007
View Source
seth_a_farrington wrote:
> Hi! I'm working on a system to try to automate the analysis of
> customer satisfaction based on a database of their e-mail
> correspondences. So far we've had good success with using a
> DynamicLMClassifier to predict whether the overall satisfaction is
> "positive", "negative", or "neutral".

> However, we would like to expand the system to provide a satisfaction
> "score" going from -100 to 100. We considered making 200 different
> categories for the classifier and just using that to assign the score,
> but that didn't work very well, probably because the classifier didn't
> have any knowledge that a category of "100" is related to a category
> of "99".
>
> Does anyone have any suggestions for how LingPipe's classification
> system might be used to assign a range of scores? Is it possible to
> use the JointClassification.score(int) method for something like this?

We've had requests to do other scalar classifiers,
scale).

This is a general problem in statistical smoothing
when you have prior knowledge of the relation of outcomes
to each other. But what we really need to do is to
clarify if the scale has any meaning. And that'll rely
on assumptions about the underlying problem.

You can use LingPipe to do this on a probabilistic scale as
follows. Assume -100 is 0% probability of positive,
and +100 is 100% probability of positive, and 0 is
50% probabilty of positive (that is, we can't tell)

score(text) = 200 * p(positive|text) - 100

Now the problem with doing this is that the probability
estimates p are based on Markovian (naive Bayes)
assumptions about independence, and most scores for
text of any length will wind up being very close to
-100 or +100. To overcome this problem, you need to
fudge the scores as is common in the speech reco world
(for the same reason -- scaling a model with erroneous
independence assumptions). So rather than using
the probability p(positive|text) from the classifier,

hPos = log p(positive,text) / length(text)
hNeg = log p(negative,text) / length(text)

pPos = 2 ** hPos
pNeg = 2 ** hNeg

These are what you get from the score() method on
the JointClassification returned by an LM classifier.

Now, to compute scores, what you want to do is this:

score(text) = 200 * pPos / ( pPos + pNeg) - 100

This should give you scores that are "reasonable", but
have no probabilistic interpretation.

Note that this only requires two models -- there is
no neutral model.

But is positive/negative really a one-dimensional scale
with neutral in the middle?

A better model might be a two-dimensional one where
each e-mail can be positive to some degree and negative
to some degree, giving it a score (pos,neg) where
a score of (100,0) would be a love letter and
(0,100) a hate letter, with (100,100) a good cop/bad
cop email, and (0,0) a very neutral business letter.

These numbers could be estimated in the same way as
above, but with a fixed score for the opposite category
as you'll find in the BinaryLMClassifier implementation.

In any of these cases, what you get isn't really
how positive or how negative something is, but rather
how likely it is to contain positive sentiment and
how likely it is to contain negative sentiment.

You might want to look at Pang and Lee's paper on
assigning scores, too, but they're also looking
at it as a multi-class classification (easier with
five stars than a -100 to 100 scale):

http://www.cs.cornell.edu/people/pabo/papers/acl05_neutral.pdf

Their more recent paper on congressional opinion
also looks interesting:

http://www.cs.cornell.edu/people/pabo/papers/emnlp06_convote.pdf

I'd guess there's been a lot of work on this problem
since then, and I'd love to hear what you find.

- Bob Carpenter
Alias-i
• Hi All, I am new in on Dynamic Model Language Classification. In fact, my knowledge are very limited on how to user LingPipe for classification, so I am trying
Message 1 of 4 , Apr 2, 2007
View Source
Hi All,

I am new in on Dynamic Model Language Classification.

In fact, my knowledge are very limited on how to user LingPipe for classification, so I am trying
to understand it by examples.

However, I appreciate deeply if someone could tell me whether Lingpipe have been experimenced
before for translation document classification. In other word, with two sets of documents each of
which contains a set (s1) of document in "l1" and the other (s2) in "l2". Any document from "s1"
has its document in "s2".

Are there any method to perform this kind of relation, even by using dictionary...but I would like
deeply if I can do it without any dictionaries.

I appreciate deeply any kind of help.

Thanks

---------
Y. Bey

___________________________________________________________________________
Découvrez une nouvelle façon d'obtenir des réponses à toutes vos questions !
Profitez des connaissances, des opinions et des expériences des internautes sur Yahoo! Questions/Réponses
• ... Let me reformulate your question to see if I understand it. You have a set S1 of docs in language L1 and a set S2 of docs in language L2. Now you want to
Message 1 of 4 , Apr 3, 2007
View Source
> I am new in on Dynamic Model Language Classification.
>
> In fact, my knowledge are very limited on how to user LingPipe for
> classification, so I am trying
> to understand it by examples.
>
> However, I appreciate deeply if someone could tell me whether Lingpipe
> have been experimenced
> before for translation document classification. In other word, with two
> sets of documents each of
> which contains a set (s1) of document in "l1" and the other (s2) in
> "l2". Any document from "s1"
> has its document in "s2".
>
> Are there any method to perform this kind of relation, even by using
> dictionary...but I would like
> deeply if I can do it without any dictionaries.
>
> I appreciate deeply any kind of help.

Let me reformulate your question to see if I
understand it. You have a set S1 of docs
in language L1 and a set S2 of docs in language L2.
Now you want to know for a given document
d in S1 if there is a corresponding document
d' in S2. Presumably the question is are they
translation equivalent in some sense.