15241Re: [Clip] Re: Removing stopwords from word list
- Jul 15, 2006On 7/15/06, jonas_ramus <jonas_ramus@...> wrote:
> Thanks! This is also an interesting solution. Evidently, it works
> with NT 5.0 Beta and NT 4.95 as well.
> It runs into some problems, however, if the stop word list is bigger
> than the word list. This works correctly with just a few words. When
> replacing the word list (16,000) and the stop word list (250) with
> each other, the output file should be empty since all the stop words
> occur in the word list.
Your comment that "all the stop words occur in the word list" tipped
me off to my error. The speed modification that I made (running
Replace without a W switch) breaks down if there is a stopword that
does not occur in the word list being cleaned.
If you modify your test files by adding a few words to your
ntf-stopwords.txt file that are NOT contained in ntf-wordlist.txt,
then you will catch my error when using the files as planned as well
The compromise solution is to reset the search to the top of the file
whenever this situation occurs by adding ^!IfError Jump 1 in the loop
as shown below.
^!Replace "^%stopwords^%index%%^p" >> "" CS
^!IfError ^!Jump 1
^!If ^%index%=^%stopwords0% skip
- << Previous post in topic Next post in topic >>