  • franz_sternbald
    Dec 1, 2004

      Thanks for the solutions you presented here...


      Actually, there are two different ways to do this job: 1. To extract
      the words you want to get, or 2. to delete the words you don't want
      to get. The problem with #2 is this: Since I'm evaluating text
      databases of 500 KB, 1 MB or more I would have to delete an enormous
      amount of characters and strings that don't match the search
      criteria. This would demand dozens of command lines and RegExes for
      reducing the file. So I tried it the other way round, i.e. by
      extracting the matching words only.


      Using the Pasteboard Function is a clever solution! With files > 500
      KB, however, this lasts an intolerable long time. So far, no error
      message has shown up but I stopped that procedure after half an hour.

      Maybe a mixture of both models would be the best solution. That is,
      first to reduce the file by eliminating certain strings, and then
      extracting the words I need. (The use of all this is to produce an
      index or thesaurus of keywords in a text database.)

      I used the ^$IsAlphaNumeric$ operator you mentioned but this wouldn't
      select compounds with hyphen like "Hewlett-Packard" since the
      uppercase letter at the beginning is followed by another uppercase
      letter. So I'm working with ^$IsUppercase(^$StrIndex("Str";1)$.

      Any more ideas would be highly appreciated...


      PS Hi Jody! Thanks for your comment - still you see me
      working on that issue. Flo ;-)
