> Hi! I'm working on a system to try to automate the analysis of
> customer satisfaction based on a database of their e-mail
> correspondences. So far we've had good success with using a
> DynamicLMClassifier to predict whether the overall satisfaction is
> "positive", "negative", or "neutral".
> However, we would like to expand the system to provide a satisfaction
> "score" going from -100 to 100. We considered making 200 different
> categories for the classifier and just using that to assign the score,
> but that didn't work very well, probably because the classifier didn't
> have any knowledge that a category of "100" is related to a category
> of "99".
> Does anyone have any suggestions for how LingPipe's classification
> system might be used to assign a range of scores? Is it possible to
> use the JointClassification.score(int) method for something like this?
We've had requests to do other scalar classifiers,
like reading level classification (on a grade-in-school
This is a general problem in statistical smoothing
when you have prior knowledge of the relation of outcomes
to each other. But what we really need to do is to
clarify if the scale has any meaning. And that'll rely
on assumptions about the underlying problem.
You can use LingPipe to do this on a probabilistic scale as
follows. Assume -100 is 0% probability of positive,
and +100 is 100% probability of positive, and 0 is
50% probabilty of positive (that is, we can't tell)
score(text) = 200 * p(positive|text) - 100
Now the problem with doing this is that the probability
estimates p are based on Markovian (naive Bayes)
assumptions about independence, and most scores for
text of any length will wind up being very close to
-100 or +100. To overcome this problem, you need to
fudge the scores as is common in the speech reco world
(for the same reason -- scaling a model with erroneous
independence assumptions). So rather than using
the probability p(positive|text) from the classifier,
instead use the cross-entropy rates:
hPos = log p(positive,text) / length(text)
hNeg = log p(negative,text) / length(text)
pPos = 2 ** hPos
pNeg = 2 ** hNeg
These are what you get from the score() method on
the JointClassification returned by an LM classifier.
Now, to compute scores, what you want to do is this:
score(text) = 200 * pPos / ( pPos + pNeg) - 100
This should give you scores that are "reasonable", but
have no probabilistic interpretation.
Note that this only requires two models -- there is
no neutral model.
But is positive/negative really a one-dimensional scale
with neutral in the middle?
A better model might be a two-dimensional one where
each e-mail can be positive to some degree and negative
to some degree, giving it a score (pos,neg) where
a score of (100,0) would be a love letter and
(0,100) a hate letter, with (100,100) a good cop/bad
cop email, and (0,0) a very neutral business letter.
These numbers could be estimated in the same way as
above, but with a fixed score for the opposite category
as you'll find in the BinaryLMClassifier implementation.
In any of these cases, what you get isn't really
how positive or how negative something is, but rather
how likely it is to contain positive sentiment and
how likely it is to contain negative sentiment.
You might want to look at Pang and Lee's paper on
assigning scores, too, but they're also looking
at it as a multi-class classification (easier with
five stars than a -100 to 100 scale):
Their more recent paper on congressional opinion
also looks interesting:
I'd guess there's been a lot of work on this problem
since then, and I'd love to hear what you find.
- Bob Carpenter