12902Re: Extracting words from a file
- Dec 3, 2004--- In email@example.com, "franz_sternbald"
> (The use of all this is to produce anHi Franz,
> index or thesaurus of keywords in a text database.)
I just happend across this thread. If I have understood your needs
correctly, why not just reduce the list to a single column of words ,
and sort them case sensitive?
1. Replace all spaces in the document with "^P" to change the list to
individual words (ignore puntuation, if you like.
2. Sort the list CASE SENSITIVE
3. Delete the lower case words
500 K files should contain about 80,000 words or so. Shouldn't take
more than a few minutes to do this by hand. If you have a lot of
files you can always write down the keystrokes you use, then do the
sort by Menu commands (^!Menu Modify/...). I think there's a
configuration switch to change sorting behaviour (remove duplicates
or not; case sensitive or not).
- << Previous post in topic Next post in topic >>