Loading ...
Sorry, an error occurred while loading the content.

need find-phrases-in-text tool

Expand Messages
  • Nick Berry
    Hi, I have posed this question before, but still haven t found anything. Does anyone know of a tool, free or pay, that will extract phrases from a body of
    Message 1 of 5 , Oct 5, 2012
    • 0 Attachment
      Hi, I have posed this question before, but still haven't found anything.  Does anyone know of a tool, free or pay, that will extract phrases from a body of text?  It needs to work on large documents, and ideally on a corpus of documents (directory of files).  It needs to find words, and phrases of 2-5 words, and *not* double-count the words that belong to phrases. It needs to be user-friendly (i.e. not CLI or regex-based).  

      Thanks to whomever points me to this holy grail.  

      Cheers,
      Nick

      Nicholas Berry
      Program Manager, Information Architecture
      Amazon.com


    • Julie Vittengl
      Hi Nick, I ve used this before and found it pretty useful. This counts words:http://www.writewords.org.uk/word_count.asp This counts
      Message 2 of 5 , Oct 5, 2012
      • 0 Attachment

        Hi Nick,

        I've used this before and found it pretty useful.

        This counts words:

        This counts phrases:

        Regards,
        Julie


        To: TaxoCoP@yahoogroups.com; iai-members@...
        From: infoglutton@...
        Date: Fri, 5 Oct 2012 09:46:01 -0700
        Subject: [TaxoCoP] need find-phrases-in-text tool

         
        Hi, I have posed this question before, but still haven't found anything.  Does anyone know of a tool, free or pay, that will extract phrases from a body of text?  It needs to work on large documents, and ideally on a corpus of documents (directory of files).  It needs to find words, and phrases of 2-5 words, and *not* double-count the words that belong to phrases. It needs to be user-friendly (i.e. not CLI or regex-based).  

        Thanks to whomever points me to this holy grail.  

        Cheers,
        Nick

        Nicholas Berry
        Program Manager, Information Architecture
        Amazon.com



      • Dalia Levine
        Hi Nick: The automatic classification tool we use on our repository of documents does what you describe. It looks for phrases in the text and knows to put
        Message 3 of 5 , Oct 5, 2012
        • 0 Attachment
          Hi Nick:
          The automatic classification tool we use on our repository of documents does what you describe. It looks for phrases in the text and knows to put weight on it based on where it is found. The tool is Smartlogic's Semaphore. The classification server part of the tool performs that function while the other part of it manages the taxonomy.

          Are you looking for text analysis? Or are you looking for text within a large file share and the perform a function (for example, find the phrase and then tag the document to be found by search)?
          I hope that helps.
          Dalia Levine


          From: Julie Vittengl <julievittengl@...>
          To: taxocop@yahoogroups.com
          Sent: Friday, October 5, 2012 1:28 PM
          Subject: RE: [TaxoCoP] need find-phrases-in-text tool

           

          Hi Nick,

          I've used this before and found it pretty useful.

          This counts words:
          http://www.writewords.org.uk/word_count.asp

          This counts phrases:
          http://www.writewords.org.uk/phrase_count.asp

          Regards,
          Julie


          To: TaxoCoP@yahoogroups.com; iai-members@...
          From: infoglutton@...
          Date: Fri, 5 Oct 2012 09:46:01 -0700
          Subject: [TaxoCoP] need find-phrases-in-text tool

           
          Hi, I have posed this question before, but still haven't found anything.  Does anyone know of a tool, free or pay, that will extract phrases from a body of text?  It needs to work on large documents, and ideally on a corpus of documents (directory of files).  It needs to find words, and phrases of 2-5 words, and *not* double-count the words that belong to phrases. It needs to be user-friendly (i.e. not CLI or regex-based).  

          Thanks to whomever points me to this holy grail.  

          Cheers,
          Nick

          Nicholas Berry
          Program Manager, Information Architecture
          Amazon.com





        • aredmondneal
          Hi, Nick, Data Harmony s M.A.I. does what you ask. It proposes appropriate metadata when it spots any word, part of word, or phrase--exact or fuzzy--that you
          Message 4 of 5 , Oct 6, 2012
          • 0 Attachment
            Hi, Nick,
            Data Harmony's M.A.I. does what you ask. It proposes appropriate metadata when it spots any word, part of word, or phrase--exact or fuzzy--that you identify as meaningful. That metadata may be held as a taxonomy in the partner tool Thesaurus Master (the combo is MAIstro) or just pull on your defined controlled vocabulary.

            Cheers to all,
            Alice


            --- In TaxoCoP@yahoogroups.com, Nick Berry <infoglutton@...> wrote:
            >
            > Hi, I have posed this question before, but still haven't found anything.
            > Does anyone know of a tool, free or pay, that will extract phrases from a
            > body of text? It needs to work on large documents, and ideally on a corpus
            > of documents (directory of files). It needs to find words, and phrases of
            > 2-5 words, and *not* double-count the words that belong to phrases. It
            > needs to be user-friendly (i.e. not CLI or regex-based).
            >
            > Thanks to whomever points me to this holy grail.
            >
            > Cheers,
            > Nick
            >
            > Nicholas Berry
            > Program Manager, Information Architecture
            > Amazon.com
            >
          • Julie Vittengl
            Hi Nick, In addition to the simple free tool (doesn t do exactly what you want but can find phrases/words in large bodies of text) below, I know of a paid tool
            Message 5 of 5 , Oct 9, 2012
            • 0 Attachment
              Hi Nick,

              In addition to the simple free tool (doesn't do exactly what you want but can find phrases/words in large bodies of text) below, I know of a paid tool that is in beta now that will soon be available for general audience.  Here's a synopsis:

              Trillium Software has a  "business data parser" which does many of the same things that we do for names/addresses but on product descriptions etc., (goes from unstructured text to structured text") It has a single line limitation etc. We are developing an upgraded version that will do the same thing but to a lot of text (such as insurance adjustor's notes, survey data, customer service notes etc. 
              We have a few customers that are using a current prototype. We are scheduled to have it ready for GA in Feb. It will have a UI to help examine sample data build dictionaries from what it found in the data etc. 

              If you are interested I can point you to my contact there.

              Regards,
              Julie


              To: taxocop@yahoogroups.com
              From: julievittengl@...
              Date: Fri, 5 Oct 2012 13:28:20 -0400
              Subject: RE: [TaxoCoP] need find-phrases-in-text tool

               


              Hi Nick,

              I've used this before and found it pretty useful.

              This counts words:

              This counts phrases:

              Regards,
              Julie


              To: TaxoCoP@yahoogroups.com; iai-members@...
              From: infoglutton@...
              Date: Fri, 5 Oct 2012 09:46:01 -0700
              Subject: [TaxoCoP] need find-phrases-in-text tool

               
              Hi, I have posed this question before, but still haven't found anything.  Does anyone know of a tool, free or pay, that will extract phrases from a body of text?  It needs to work on large documents, and ideally on a corpus of documents (directory of files).  It needs to find words, and phrases of 2-5 words, and *not* double-count the words that belong to phrases. It needs to be user-friendly (i.e. not CLI or regex-based).  

              Thanks to whomever points me to this holy grail.  

              Cheers,
              Nick

              Nicholas Berry
              Program Manager, Information Architecture
              Amazon.com




            Your message has been successfully submitted and would be delivered to recipients shortly.