Loading ...
Sorry, an error occurred while loading the content.

12902Re: Extracting words from a file

Expand Messages
  • abairheart
    Dec 3, 2004
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, "franz_sternbald"
      <franz_sternbald@y...> wrote:
      > (The use of all this is to produce an
      > index or thesaurus of keywords in a text database.)

      Hi Franz,

      I just happend across this thread. If I have understood your needs
      correctly, why not just reduce the list to a single column of words ,
      and sort them case sensitive?

      1. Replace all spaces in the document with "^P" to change the list to
      individual words (ignore puntuation, if you like.

      2. Sort the list CASE SENSITIVE

      3. Delete the lower case words

      500 K files should contain about 80,000 words or so. Shouldn't take
      more than a few minutes to do this by hand. If you have a lot of
      files you can always write down the keystrokes you use, then do the
      sort by Menu commands (^!Menu Modify/...). I think there's a
      configuration switch to change sorting behaviour (remove duplicates
      or not; case sensitive or not).

    • Show all 23 messages in this topic