15216[Clip] Re: Removing stopwords from word list
- Jul 14, 2006Hi Flo,
Yahoo has wrapped a couple of long lines. The first was actually
commented out, it begins with.
The wrapped portion is getting written into your file. It looks like
it actually wrapped twice, once at "Documents" and again at "^%Pat"
Another is nearer the bottom, and ^!Info command, looks like it
wrapped at the word "Unique".
Please unwrap those long lines and try it again. You could actually
remove the long comment line along with the two comments lines above
--- In firstname.lastname@example.org, "jonas_ramus" <jonas_ramus@...>
> I immediately tested this third version. There are some problems
> with the output as follows:
> 1. When taking as...
> word list: ntf-wordlist.txt (16,000)
> stop words: ntf-stopwords (250)
> the last lines of the output are (lines numbers added)...
> 15500 Yucca
> 15501 Yumen
> 15002 Zweifacher|Zwei-Tank-Systeme|Zwei|Zweckverband|....
> 15503 Unique Words with Stop Words Removed.
> That is, it outputs all the stop words (Z-words). Line 15503
> the text of the final message.
> 2. When taking as...
> word list: B-words (1,176)
> stop words: A+B-words (2,233)
> the last lines of the output are (line numbers added)...
> 1057 A-Klasse
> 1057 A-Klasse-Prototypen
> 1058 Binnennachfrage|Binnenmarkt|Binnenland...
> 1059 Büssem|Bürotechnik|Büros|Büroleitung
> 1060 Documents\NotetabBetaTest\badregexpreplace.txt" Pattern 1:
> 1061 Unique Words with Stop Words Removed.
> That is, out of 2,233 stop words, it outputs 1,526 stop words with
> the result plus some additional text.
> Thanks for your great help and patience in this matter!
- << Previous post in topic Next post in topic >>