Loading ...
Sorry, an error occurred while loading the content.

15215[Clip] Re: Removing stopwords from word list

Expand Messages
  • jonas_ramus
    Jul 14, 2006
      Sheri,

      I immediately tested this third version. There are some problems
      with the output as follows:

      1. When taking as...

      word list: ntf-wordlist.txt (16,000)
      stop words: ntf-stopwords (250)

      the last lines of the output are (lines numbers added)...

      Line
      15500 Yucca
      15501 Yumen
      15002 Zweifacher|Zwei-Tank-Systeme|Zwei|Zweckverband|....
      15503 Unique Words with Stop Words Removed.

      That is, it outputs all the stop words (Z-words). Line 15503 outputs
      the text of the final message.

      2. When taking as...

      word list: B-words (1,176)
      stop words: A+B-words (2,233)

      the last lines of the output are (line numbers added)...

      Line
      1057 A­-Klasse
      1057 A­-Klasse-Prototypen
      1058 Binnennachfrage|Binnenmarkt|Binnenland...
      1059 Büssem|Bürotechnik|Büros|Büroleitung
      1060 Documents\NotetabBetaTest\badregexpreplace.txt" Pattern 1:
      1061 Unique Words with Stop Words Removed.

      That is, out of 2,233 stop words, it outputs 1,526 stop words with
      the result plus some additional text.

      Thanks for your great help and patience in this matter!

      Flo
    • Show all 30 messages in this topic