Loading ...
Sorry, an error occurred while loading the content.

Word Frequency Tool enhancment (keyword generation clip)

Expand Messages
  • dan@fairness.com
    I am a long-time NoteTab Pro (4.85) user that hasn t yet made the leap into clip programming... I now have an urgent need for a clip, one that s probably (i)
    Message 1 of 3 , Jun 5 8:37 AM
    • 0 Attachment
      I am a long-time NoteTab Pro (4.85) user that hasn't yet
      made the leap into clip programming... I now have an
      urgent need for a clip, one that's probably (i) not too hard
      to write and (ii) should be of use to others as well.

      For my soon-to-luanch website I'm spending too much
      time scanning articles for keywords. I've used NoteTab's
      word frequency tool to help, but it takes a long to to strip
      out all the junky words (a, an, the, etc.).

      I realized it should be possible for a clip to enhance
      NoteTabs word frequency tool to generate keywords for
      articles I'm writing (and other articles too) much more
      quickly.

      Here's what I'm visualizing:

      1) take full-text I've cut and paste from another
      source

      2) run it through NoteTab's word frequency tool
      (Tools | Text Statistics | More)


      Now for a couple of enhancements:

      3) from the word frequency list generated above,
      remove all words that appear on an "excluded words list"
      (case insensitive!) containing words such as "a", "an",
      "the", "is", "or", etc. I'm guessing those words would be
      stored in either a separate text file or perhaps even in the
      clip itself... but in a way that's (a) easy to make
      additions to and deletions from the excluded word list,
      and (b) doesn't require any software other than notetab
      and Windows. [Of course if someone can provide this clip
      for me, I'll gladly compile a comprehensive "excluded
      word" list and donate it back to the group for others who
      want to use such a clip.]

      4) select words that appear *only* in initial capital
      letters, as a way to identify person and place names
      (proper nouns, my English teachers of past years would
      want me to say!) that are likely candidates for being
      keywords. These names would be put at the top of the
      frequency list. A few random words would be capital-only,
      but I don't expect that to be a problem.

      A "wish-list" item that is not critical, but...

      5) if possible, consecutive words that all appear
      *only* in initial-letter-capitalized form (eg. "George Bush",
      george never appears, just George) would be placed
      together on the same line in the word frequency list, eg. it
      would be good to have an entry on the list "George Bush"
      in addition to the individual entries "Bush" in the B's and
      "George" in the G's. [Why? In some searches users hunt
      for "George Bush", and my site's search algorithm would
      require those words to be consecutive in the keywords
      list for that query to be successful.]


      Any help or ideas would be greatly appreciated. If a clip
      is not the way to go, feel free to let me know that as well.

      Thanks!!
    • Jody
      Hi Dan, ... You can write the script yourself. You need to just make a series of Release commands. An array may make it easier in the long run and perhaps
      Message 2 of 3 , Jun 5 10:40 AM
      • 0 Attachment
        Hi Dan,

        >For my soon-to-luanch website I'm spending too much
        >time scanning articles for keywords. I've used NoteTab's
        >word frequency tool to help, but it takes a long to to strip
        >out all the junky words (a, an, the, etc.).

        You can write the script yourself. You need to just make a
        series of Release commands. An array may make it easier in the
        long run and perhaps better because it is easier to add or take
        away words quickly, but it really would not be hard using copy/
        paste of the Release command. It would look like the following:

        ^!Replace "a" >> "" WAS
        ^!Replace "an" >> "" WAS
        ^!Replace "and" >> "" WAS
        ^!Replace "the" >> "" WAS
        ; just keep adding to the list. You might just paste a number
        ; of ^!Replace "" >> "" WAS to make it easy to add the word you
        ; want removed. End it with:
        ^!Replace " " >> " " WAS
        ^!Replace "^p " >> "^p" WAS

        If you want to go with an array it would look like the following that I copied from a larger Clip I use that puts capitalization correct in titles.

        <--- Copy below this line --->
        H=Delete Little Words
        ; Last Updated 06-05-2001, Sojourner@..., jody
        ; Removes the words below from a document
        ; Add or remove words, acronyms, etc. as you prefer them.
        ; Also get H=Word Extract & Sort (03-13-2000) at Snatch-A-Clip
        ; http://www.notetab.net/html/snatchclp.htm to alphabetize
        ; Direct link http://www.notetab.net/html/extrsort.htm
        ; (I just noticed it also deletes small words. Oh well. ;)

        ^!SetListDelimiter ,^%Space%
        ^!SetArray %Words%=a, an, and, as, at, be, but, by, do, for, if, in, is, it, nor, of, off, on, or, out, the, to, too, I, II, III, IV, V, VI, VII, VIII, IX, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, IXX, XX, XXI, XXII, XXIII, XXIV, XXV, XXVI, XXVII, XXVIII, XXIX, XXX, ABC, AOL, ATM, ATT, CBS, ESPN, GM, HBO, IBM, ISBN, ISP, LOL, MCI, OK, NBA, NBC, NT, UPS, USA, USAF, USMC, USN, WWW, Y2K, YMCA, PMS, MS, MSN, ;-)
        ^!Set %Count%=^%Words0%
        ^!Set %Index%=0

        :Loop
        ^!Inc %Index%
        ^!If ^%Index% > ^%Count% Skip_2
        ^!Replace "^%Words^%Index%%" >> "" WAS
        ^!Goto Loop
        ^!Replace " " >> " " WAS
        ^!IfError Next else Skip_-1

        <--- Copy above this line, right --->
        <--- click over a Library, and --->
        <--- choose "Add from Clipboard" --->

        I think those Clips should do all that you need. The one on my
        Snatch-A-Clip page probably will do it all. It was made for
        exactly what you are doing.

        >Here's what I'm visualizing:
        >
        > 1) take full-text I've cut and paste from another
        >source
        >
        > 2) run it through NoteTab's word frequency tool
        >(Tools | Text Statistics | More)
        >
        >
        >Now for a couple of enhancements:
        >
        > 3) from the word frequency list generated above,
        >remove all words that appear on an "excluded words list"
        >(case insensitive!) containing words such as "a", "an",
        >"the", "is", "or", etc. I'm guessing those words would be
        >stored in either a separate text file or perhaps even in the
        >clip itself... but in a way that's (a) easy to make
        >additions to and deletions from the excluded word list,
        >and (b) doesn't require any software other than notetab
        >and Windows. [Of course if someone can provide this clip
        >for me, I'll gladly compile a comprehensive "excluded
        >word" list and donate it back to the group for others who
        >want to use such a clip.]
        >
        > 4) select words that appear *only* in initial capital
        >letters, as a way to identify person and place names
        >(proper nouns, my English teachers of past years would
        >want me to say!) that are likely candidates for being
        >keywords. These names would be put at the top of the
        >frequency list. A few random words would be capital-only,
        >but I don't expect that to be a problem.
        >
        >A "wish-list" item that is not critical, but...
        >
        > 5) if possible, consecutive words that all appear
        >*only* in initial-letter-capitalized form (eg. "George Bush",
        >george never appears, just George) would be placed
        >together on the same line in the word frequency list, eg. it
        >would be good to have an entry on the list "George Bush"
        >in addition to the individual entries "Bush" in the B's and
        >"George" in the G's. [Why? In some searches users hunt
        >for "George Bush", and my site's search algorithm would
        >require those words to be consecutive in the keywords
        >list for that query to be successful.]
        >
        >
        >Any help or ideas would be greatly appreciated. If a clip
        >is not the way to go, feel free to let me know that as well.
        >
        >Thanks!!
        >
        >
        >
        >
        >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/


        Happy Clip'n!
        Jody

        http://www.notetab.net

        Subscribe, UnSubscribe, Options
        mailto:Ntb-Clips-Subscribe@yahoogroups.com
        mailto:Ntb-Clips-UnSubscribe@yahoogroups.com
        http://www.egroups.com/group/ntb-clips
      • cmichaelbeck@hotmail.com
        This would be a VERY COOL THING. I m just beginning to play with clips and looking to build/find the same clip for myself. Trey
        Message 3 of 3 , Jun 11 12:05 PM
        • 0 Attachment
          This would be a VERY COOL THING. I'm just beginning to play with
          clips and looking to build/find the same clip for myself.

          Trey

          --- In ntb-clips@y..., dan@f... wrote:
          > I am a long-time NoteTab Pro (4.85) user that hasn't yet
          > made the leap into clip programming... I now have an
          > urgent need for a clip, one that's probably (i) not too hard
          > to write and (ii) should be of use to others as well.
          >
          > For my soon-to-luanch website I'm spending too much
          > time scanning articles for keywords. I've used NoteTab's
          > word frequency tool to help, but it takes a long to to strip
          > out all the junky words (a, an, the, etc.).
          >
          > I realized it should be possible for a clip to enhance
          > NoteTabs word frequency tool to generate keywords for
          > articles I'm writing (and other articles too) much more
          > quickly.
          >
          > Here's what I'm visualizing:
          >
          > 1) take full-text I've cut and paste from another
          > source
          >
          > 2) run it through NoteTab's word frequency tool
          > (Tools | Text Statistics | More)
          >
          >
          > Now for a couple of enhancements:
          >
          > 3) from the word frequency list generated above,
          > remove all words that appear on an "excluded words list"
          > (case insensitive!) containing words such as "a", "an",
          > "the", "is", "or", etc. I'm guessing those words would be
          > stored in either a separate text file or perhaps even in the
          > clip itself... but in a way that's (a) easy to make
          > additions to and deletions from the excluded word list,
          > and (b) doesn't require any software other than notetab
          > and Windows. [Of course if someone can provide this clip
          > for me, I'll gladly compile a comprehensive "excluded
          > word" list and donate it back to the group for others who
          > want to use such a clip.]
          >
          > 4) select words that appear *only* in initial capital
          > letters, as a way to identify person and place names
          > (proper nouns, my English teachers of past years would
          > want me to say!) that are likely candidates for being
          > keywords. These names would be put at the top of the
          > frequency list. A few random words would be capital-only,
          > but I don't expect that to be a problem.
          >
          > A "wish-list" item that is not critical, but...
          >
          > 5) if possible, consecutive words that all appear
          > *only* in initial-letter-capitalized form (eg. "George Bush",
          > george never appears, just George) would be placed
          > together on the same line in the word frequency list, eg. it
          > would be good to have an entry on the list "George Bush"
          > in addition to the individual entries "Bush" in the B's and
          > "George" in the G's. [Why? In some searches users hunt
          > for "George Bush", and my site's search algorithm would
          > require those words to be consecutive in the keywords
          > list for that query to be successful.]
          >
          >
          > Any help or ideas would be greatly appreciated. If a clip
          > is not the way to go, feel free to let me know that as well.
          >
          > Thanks!!
        Your message has been successfully submitted and would be delivered to recipients shortly.