12887Re: Extracting words from a file
- Dec 1, 2004Hi,
Thanks for the solutions you presented here...
Actually, there are two different ways to do this job: 1. To extract
the words you want to get, or 2. to delete the words you don't want
to get. The problem with #2 is this: Since I'm evaluating text
databases of 500 KB, 1 MB or more I would have to delete an enormous
amount of characters and strings that don't match the search
criteria. This would demand dozens of command lines and RegExes for
reducing the file. So I tried it the other way round, i.e. by
extracting the matching words only.
Using the Pasteboard Function is a clever solution! With files > 500
KB, however, this lasts an intolerable long time. So far, no error
message has shown up but I stopped that procedure after half an hour.
Maybe a mixture of both models would be the best solution. That is,
first to reduce the file by eliminating certain strings, and then
extracting the words I need. (The use of all this is to produce an
index or thesaurus of keywords in a text database.)
I used the ^$IsAlphaNumeric$ operator you mentioned but this wouldn't
select compounds with hyphen like "Hewlett-Packard" since the
uppercase letter at the beginning is followed by another uppercase
letter. So I'm working with ^$IsUppercase(^$StrIndex("Str";1)$.
Any more ideas would be highly appreciated...
PS Hi Jody! Thanks for your comment - still you see me
working on that issue. Flo ;-)
- << Previous post in topic Next post in topic >>