12888Re: Extracting words from a file
- Dec 1, 2004franz,
Are you using Pro or Light? That makes quite a difference in speed.
What about this approach? You can easily see for yourself if this is
of any help.
1. replace " " with "^P" - don't know how fast that would be
2. trim/left align the text (which should have most words on a
separate line by now)
3. sort the document with [Case Sensitive Sorting] and [Remove
Duplicates] switched on (in options)
> Maybe a mixture of both models would be the best solution. That is,wouldn't
> first to reduce the file by eliminating certain strings, and then
> extracting the words I need. (The use of all this is to produce an
> index or thesaurus of keywords in a text database.)
> I used the ^$IsAlphaNumeric$ operator you mentioned but this
> select compounds with hyphen like "Hewlett-Packard" since the
> uppercase letter at the beginning is followed by another uppercase
> letter. So I'm working with ^$IsUppercase(^$StrIndex("Str";1)$.
- << Previous post in topic Next post in topic >>