Loading ...
Sorry, an error occurred while loading the content.

12888Re: Extracting words from a file

Expand Messages
  • Hugo Paulissen
    Dec 1, 2004

      Are you using Pro or Light? That makes quite a difference in speed.

      What about this approach? You can easily see for yourself if this is
      of any help.

      1. replace " " with "^P" - don't know how fast that would be
      2. trim/left align the text (which should have most words on a
      separate line by now)
      3. sort the document with [Case Sensitive Sorting] and [Remove
      Duplicates] switched on (in options)


      > Maybe a mixture of both models would be the best solution. That is,
      > first to reduce the file by eliminating certain strings, and then
      > extracting the words I need. (The use of all this is to produce an
      > index or thesaurus of keywords in a text database.)
      > I used the ^$IsAlphaNumeric$ operator you mentioned but this
      > select compounds with hyphen like "Hewlett-Packard" since the
      > uppercase letter at the beginning is followed by another uppercase
      > letter. So I'm working with ^$IsUppercase(^$StrIndex("Str";1)$.
    • Show all 23 messages in this topic