Loading ...
Sorry, an error occurred while loading the content.

15241Re: [Clip] Re: Removing stopwords from word list

Expand Messages
  • Bob McAllister
    Jul 15, 2006
      On 7/15/06, jonas_ramus <jonas_ramus@...> wrote:

      > Bob,
      > Thanks! This is also an interesting solution. Evidently, it works
      > with NT 5.0 Beta and NT 4.95 as well.
      > It runs into some problems, however, if the stop word list is bigger
      > than the word list. This works correctly with just a few words. When
      > replacing the word list (16,000) and the stop word list (250) with
      > each other, the output file should be empty since all the stop words
      > occur in the word list.


      Your comment that "all the stop words occur in the word list" tipped
      me off to my error. The speed modification that I made (running
      Replace without a W switch) breaks down if there is a stopword that
      does not occur in the word list being cleaned.

      If you modify your test files by adding a few words to your
      ntf-stopwords.txt file that are NOT contained in ntf-wordlist.txt,
      then you will catch my error when using the files as planned as well
      as backwards.

      The compromise solution is to reset the search to the top of the file
      whenever this situation occurs by adding ^!IfError Jump 1 in the loop
      as shown below.
      ^!Replace "^%stopwords^%index%%^p" >> "" CS
      ^!IfError ^!Jump 1
      ^!Inc %index%
      ^!If ^%index%=^%stopwords0% skip
      ^!Goto loop

      Bob McAllister
    • Show all 30 messages in this topic