Loading ...
Sorry, an error occurred while loading the content.

multi-label classification

Expand Messages
  • lenat79
    Hello Bob, Thank you very much for your previous answer. I would deeply appreciate your suggestion in the following question. I am going to use LingPipe
    Message 1 of 2 , Jun 24, 2007
    View Source
    • 0 Attachment
      Hello Bob,

      Thank you very much for your previous answer.
      I would deeply appreciate your suggestion in the following question.
      I am going to use LingPipe LMClassifier for multi-label news
      classification. I think about two ways to do it. First way is to use
      DynamicLMClassifier assigning to the content all categories with
      conditionalProbability or jointLog2Probability above the defined
      threshold. Here I fear from defining threshold, I can't even imagine
      what it should be.
      The second way is to build BinaryLMClassifier for each category and
      assign the content with all "true" categories. In the second case I
      also have to define some crossEntropyThreshold. And also I am
      afraid of multiplicity of classifiers (I have about 400 categories)
      and as a result huge reduce of classification time performance.
      What would you suggest?

      Also, I saw you've added a PerceptronClassifier in the last release.
      What is your expectation regarding its performance comparably to
      DynamicLMClassifier. Would you suggest using it instead?

      Thanks,
      Lena
    • Bob Carpenter
      I missed responding to Lena Tenenboim s question on the group a while back, so here goes. ... What to do is going to depend heavily on whether the categories
      Message 2 of 2 , Jul 6, 2007
      View Source
      • 0 Attachment
        I missed responding to Lena Tenenboim's question on the group
        a while back, so here goes.

        > I am going to use LingPipe LMClassifier for multi-label news
        > classification.

        What to do is going to depend heavily on whether the
        categories are exhaustive (every input falls in a
        category) and exclusive (no input falls into two categories).
        That's the situation all of our classifiers were designed
        for (including the new ones).

        > I think about two ways to do it. First way is to use
        > DynamicLMClassifier assigning to the content all categories with
        > conditionalProbability or jointLog2Probability above the defined
        > threshold. Here I fear from defining threshold, I can't even imagine
        > what it should be.

        If the above conditions are met, you don't have to define
        a threshold. Just take the best-scoring category.

        If you want to try to reject some inputs because they don't
        match any categories, the best thing to look at is cross-entropy
        rates, which are refelcted in the score() method in the resulting
        classifications from the LM-based classifiers. You need to look
        at scores for matching and non-matching docs and use that to set
        a threshold.

        Another way to do it would be to have a rejection/junk
        or none-of-the-above category. But those are tricky to train,
        because there's no way to get a balance of all the docs that
        aren't one of your categories.

        > The second way is to build BinaryLMClassifier for each category and
        > assign the content with all "true" categories. In the second case I
        > also have to define some crossEntropyThreshold.

        This is the better approach if the categories are not
        mutually exclusive and exhaustive. Again, you have
        the choice of negative models or threshold. Again, set the thresholds
        empirically on a per-category basis. The way to do this is
        with cross-validation. Divide your data up into ten piles per
        category, then for each pile, train on all other piles and test
        on that pile and choose the settings that work the best.

        > And also I am
        > afraid of multiplicity of classifiers (I have about 400 categories)
        > and as a result huge reduce of classification time performance.

        It depends on the amount of training data you have.
        You can also prune the models through the counters
        underlying the language models.

        The character LMs run at 1-2M chars/second, so 500
        categories would run at 2-4K chars/second.

        Generally, when people try to scale something like this, they
        look for a so-called "blocking" (de-duplication/linkage terminology)
        or "fast match" (speech reco terminology) or "umbrella" (Andrew
        McCallum's term) to find possible matches efficiently before
        running more expensive classifiers on the candidates.

        A good way to do this is with hiearchical classification.
        Then you just classify at the top level, take any likely
        candidates, then proceed down the hierarchy. That way,
        you don't waste time trying to separate tennis from golf
        when it's not an article about sports.

        > Also, I saw you've added a PerceptronClassifier in the last release.
        > What is your expectation regarding its performance comparably to
        > DynamicLMClassifier. Would you suggest using it instead?

        I haven't actually played around much with perceptrons, and haven't
        done any large-scale evals. The main advantage of perceptrons is
        that they provide large-margin discriminative training over arbitrary
        features. So it'll largely depend on what you use as features. You
        could use character n-grams of various lengths, token n-grams,
        tokens plus part-of-speech tags, or whatever.

        There are two main problems with perceptrons. One, they're
        essentially binary classifiers. You can make them multi-way
        by just using their scores, but our implementation won't be very
        efficient for that (because the basis/support vectors won't be shared).

        The second main problem is that perceptrons aren't dynamic -- you train
        them all at once iteratively. At least the averaged perceptron
        implementation in LingPipe. And they're very costly to train and
        run in both size and compute time.

        So I'm afraid you'll have to make do with this weasely answer.
        If you do evals and get results, feel free to post to the list.

        The TF/IDF classifier should scale well, but the KNN classifier
        stores everything and is thus even more difficult to scale
        than the perceptrons.

        - Bob Carpenter
        Alias-i
      Your message has been successfully submitted and would be delivered to recipients shortly.