Loading ...
Sorry, an error occurred while loading the content.

LingPipe 2.2.1 patched for Java 1.4

Expand Messages
  • Bob Carpenter
    I rebuilt LingPipe 2.2.1 on the web site so that it would work with Java 1.4. I hope this clears up major/minor version problems. - Bob Carpenter Alias-i
    Message 1 of 3 , Mar 27, 2006
    View Source
    • 0 Attachment
      I rebuilt LingPipe 2.2.1 on the web site
      so that it would work with Java 1.4. I
      hope this clears up major/minor version
      problems.

      - Bob Carpenter
      Alias-i
    • Sanjay Singh
      Hi, I have a question regarding the sentiment api. The LMClassifier does a good job in getting the best catgeory among pos and neg . How can I get a
      Message 2 of 3 , Mar 29, 2006
      View Source
      • 0 Attachment
        Hi,

        I have a question regarding the sentiment api.

        The LMClassifier does a good job in getting the
        best catgeory among "pos" and "neg". How can I get a
        rank/probability/confidence number from the
        classifiers, so that

        a) if the classfier is VERY confident, I will accept
        the pos/neg classification.
        b) If the classifier is LESS confident, I may direct
        the classification to another process.
        c) If the classifier is NOT confident at all I will
        throw away the classification subject.


        Thanks
        --
        Jay



        --- Bob Carpenter <carp@...> wrote:

        > I rebuilt LingPipe 2.2.1 on the web site
        > so that it would work with Java 1.4. I
        > hope this clears up major/minor version
        > problems.
        >
        > - Bob Carpenter
        > Alias-i
        >


        __________________________________________________
        Do You Yahoo!?
        Tired of spam? Yahoo! Mail has the best spam protection around
        http://mail.yahoo.com
      • carp@alias-i.com
        ... You ll have to get that through the API -- the tutorial doesn t cover it (yet, anyway). The classification API is set up to run through implementations of
        Message 3 of 3 , Mar 29, 2006
        View Source
        • 0 Attachment
          Sanjay Singh wrote:
          > Hi,
          >
          > I have a question regarding the sentiment api.
          >
          > The LMClassifier does a good job in getting the
          > best catgeory among "pos" and "neg". How can I get a
          > rank/probability/confidence number from the
          > classifiers, so that
          >
          > a) if the classfier is VERY confident, I will accept
          > the pos/neg classification.
          > b) If the classifier is LESS confident, I may direct
          > the classification to another process.
          > c) If the classifier is NOT confident at all I will
          > throw away the classification subject.

          You'll have to get that through the API -- the tutorial
          doesn't cover it (yet, anyway).

          The classification API is set up to run through
          implementations of a very simple interface:

          interface Classifier
          Classification classify(Object input);

          Depending on the implementation, you may be able to
          cast the resulting classification to a more specific
          class. The language-model based classifiers return
          instances of classify.JointClassification. Sorting
          through the inheritance a JointClassification provides
          the following methods:

          Classification
          String bestCategory();

          RankedClassification extends Classification
          int size();
          String category(int rank);

          ScoredClassification extends RankedClassification
          double score(int rank);

          ConditionalClassification extends ScoredClassification
          double conditionalProbability(int rank);

          JointClassification extends ConditionalClassification
          double jointLog2Probability(int rank);

          The conditional probability is an estimate of the probability
          of the category given the input, and is the thing to use
          to set thresholds. The joint probability includes the
          probability of the object being classified, and is thus
          not scaled to be comparable across different object inputs.

          So if your classes are "pos", "neg", you can get LingPipe's
          estimate of "pos" confidence by:

          String input = ...;
          ConditionalClassification classification
          = (ConditionalClassification) classifier.classify(input);
          double posConfidence = Double.NEGATIVE_INFINITY;
          if (classification.size() > 0 && classification.category(0).equals("pos"))
          posConfidence = classification.conditionalProbability(0);
          else if (classification.size() > 1 && classification.category(1).equals("pos"))
          posConfidence = classification.conditionalProbability(1);

          This is being extra paranoid in case for some reason the
          classifier doesn't return enough results -- this shouldn't
          happen with the LM classifiers built in the usual way. So
          this could be simplified to:

          int posIndex
          = classification.category(0).equals("pos")
          ? 0
          : 1;
          double posConfidence = classification.conditionalProbability(posIndex);

          I should provide a warning: if the input is very long, these
          probabilities will quickly approach 0 or 1 and may even round off
          to 0 or 1 with 64-bit floating-point arithmetic. This is a problem
          with the underlying model's naivete -- it doesn't account for
          any of the conditional probability structure of topic or longer
          distance, so its confidence tends to get exaggerated. (This is
          a problem with almost all text classifiers, and is an even greater
          problem with speech-based acoustic classifiers.)

          A better thing to use for the ranking is the cross-entropy rate,
          which is defined to be the joint probability divided by the
          input length:

          classification.jointLog2Probability(posIndex) / input.length();

          The unit here is bits-per-character -- it's what it'd
          cost to compress using the positive model in an entropy coder.

          I'm afraid we don't provide any help in setting thresholds
          for these -- I'd suggest inspecting your results and setting
          it empirically.

          One more suggestion: if you have only a positive model, then
          you can actually set up a classifier based on this kind of
          threshold without having to train a negative model. Sometimes
          this works better, especially if the negative instances do not
          form a coherent topical/genre/etc. collection.

          - Bob
        Your message has been successfully submitted and would be delivered to recipients shortly.