Loading ...
Sorry, an error occurred while loading the content.
 

Seeking Info on Auto-Indexing Systems

Expand Messages
  • C. Robertson
    Thought it was time to de-lurk and ask a question or two.  This is a great group, and I m already learning a lot from reading the posts.  I don t have a
    Message 1 of 11 , Jan 20, 2012
      Thought it was time to de-lurk and ask a question or two.  This is a great group, and I'm already learning a lot from reading the posts.  I don't have a formal background in taxonomy.  In fact, I'm an entrepreneur, but I think to my credit, I understand the great need for taxonomies and the people that work to create them.

      I'm wondering if anyone knows of open-source to inexpensive auto-indexing or tagging systems that are available.  I've gotten the scoop on some but they are way out of the league ($$$$) of what I can afford at this time at my start up which is a food-related biz.  Does anyone have any ideas on systems that I might use or I've overlooked?  I'm also looking for the same in controlled vocabularies that a system would need.  BTW, I have to thank Seth, Patrick Lambe and Heather Hedden here, because I only know about these systems and controlled vocabularies through theirs sites and books.

      This "accidental taxonomist" looks forward to hearing from anyone!  I'll be happy to provide more info on my start ups needs, too, if necessary.  Thank you.

      C. "Rob" Robertson
      crobertsonuno@...
      crobertsonuno@...


    • John O'Gorman
      Hi Rob; I m not sure if this is what you are looking for (and I would be interested in seeing your start up needs) but have a look at concordance
      Message 2 of 11 , Jan 20, 2012
      Hi Rob;
       
      I'm not sure if this is what you are looking for (and I would be interested in seeing your start up needs) but have a look at concordance http://www.concordancesoftware.co.uk/
       
       
      It's a good place to start to see what indexing tools produce. On the off-chance you want to see how I manage something called 'foundational taxonomy' please have a look at this article:  http://web.fumsi.com/go/article/manage/4528
      and the attached document for more information.
       
      Let me know what you think and if this is of any use to you.
       
      Cheers.
       
      John O'
       
       
      -----Original Message-----
      From: C. Robertson [mailto:crobertsonuno@...]
      Sent: Friday, January 20, 2012 10:38 AM
      To: TaxoCoP@yahoogroups.com
      Subject: [TaxoCoP] Seeking Info on Auto-Indexing Systems

       
      Thought it was time to de-lurk and ask a question or two.  This is a great group, and I'm already learning a lot from reading the posts.  I don't have a formal background in taxonomy.  In fact, I'm an entrepreneur, but I think to my credit, I understand the great need for taxonomies and the people that work to create them.

      I'm wondering if anyone knows of open-source to inexpensive auto-indexing or tagging systems that are available.  I've gotten the scoop on some but they are way out of the league ($$$$) of what I can afford at this time at my start up which is a food-related biz.  Does anyone have any ideas on systems that I might use or I've overlooked?  I'm also looking for the same in controlled vocabularies that a system would need.  BTW, I have to thank Seth, Patrick Lambe and Heather Hedden here, because I only know about these systems and controlled vocabularies through theirs sites and books.

      This "accidental taxonomist" looks forward to hearing from anyone!  I'll be happy to provide more info on my start ups needs, too, if necessary.  Thank you.

      C. "Rob" Robertson
      crobertsonuno@...
      crobertsonuno@...


       

    • David Riecks
      ... Rob: What kind of items are you tagging? Images, Video, ebooks, text, or other media? Are you looking for applications that can embed or bind the
      Message 3 of 11 , Jan 20, 2012
        At 11:38 AM 1/20/2012, C. Robertson wrote:
        I'm wondering if anyone knows of open-source to inexpensive auto-indexing or tagging systems that are available.

        Rob:

        What kind of items are you tagging?  Images, Video, ebooks, text, or other media?

        Are you looking for applications that can embed or bind the information to an file, or applications that can used to manage a taxonomy / controlled vocabulary?

         I'm mostly familiar with images, and have listed a number of applications that can be used to "annotate" images on my site... see http://www.controlledvocabulary.com/imagedatabases/programs.html for details. Applications for managing keyword catalogs that can be used to tag images are covered as well on http://www.controlledvocabulary.com/metalogging/metalog_resources.html under the "programs" section.

        I've gotten the scoop on some but they are way out of the league ($$$$) of what I can afford at this time at my start up which is a food-related biz.

        If you are dealing with file formats other than images, but are housing them in a database that has a field for "keywords" then you can still use some of these tools mentioned above to generate the "tags" -- pulling them from a hierarchical structure -- and then "paste" the results into your own database.  As one example, see the short video, "How to use the Controlled Vocabulary Keyword Catalog (CVKC) as a Keyword Generator".at http://www.controlledvocabulary.com/movies.html

        This "accidental taxonomist" looks forward to hearing from anyone!  I'll be happy to provide more info on my start ups needs, too, if necessary.  Thank you.

        That might help, as right now, I'm having to make a number of assumptions, the majority of which are likely to be wrong. ;-)

        David

        --
        David Riecks  (that's "i" before "e", but the "e" is silent)
        Need Keywords for your database? Get the Controlled Vocabulary Solution
        http://controlledvocabulary.com/products/ support for a dozen of the
        most popular imaging applications from Adobe Bridge to Photo Mechanic.

      • crobertsonuno
        Thanks for this info. I am hoping to read free form text and then establish appropriate tags based on a controlled vocabulary.
        Message 4 of 11 , Jan 25, 2012
          Thanks for this info.

          I am hoping to read free form text and then establish appropriate tags based on a controlled vocabulary.

          --- In TaxoCoP@yahoogroups.com, David Riecks <david@...> wrote:
          >
          > At 11:38 AM 1/20/2012, C. Robertson wrote:
          > >I'm wondering if anyone knows of open-source to inexpensive
          > >auto-indexing or tagging systems that are available.
          >
          > Rob:
          >
          > What kind of items are you tagging? Images, Video, ebooks, text, or
          > other media?
          >
          > Are you looking for applications that can embed or bind the
          > information to an file, or applications that can used to manage a
          > taxonomy / controlled vocabulary?
          >
          > I'm mostly familiar with images, and have listed a number of
          > applications that can be used to "annotate" images on my site... see
          > http://www.controlledvocabulary.com/imagedatabases/programs.html for
          > details. Applications for managing keyword catalogs that can be used
          > to tag images are covered as well on
          > http://www.controlledvocabulary.com/metalogging/metalog_resources.html
          > under the "programs" section.
          >
          > >I've gotten the scoop on some but they are way out of the league
          > >($$$$) of what I can afford at this time at my start up which is a
          > >food-related biz.
          >
          > If you are dealing with file formats other than images, but are
          > housing them in a database that has a field for "keywords" then you
          > can still use some of these tools mentioned above to generate the
          > "tags" -- pulling them from a hierarchical structure -- and then
          > "paste" the results into your own database. As one example, see the
          > short video, "How to use the Controlled Vocabulary Keyword Catalog
          > (CVKC) as a Keyword Generator".at
          > http://www.controlledvocabulary.com/movies.html
          >
          > >This "accidental taxonomist" looks forward to hearing from
          > >anyone! I'll be happy to provide more info on my start ups needs,
          > >too, if necessary. Thank you.
          >
          > That might help, as right now, I'm having to make a number of
          > assumptions, the majority of which are likely to be wrong. ;-)
          >
          > David
          >
          >
          > --
          > David Riecks (that's "i" before "e", but the "e" is silent)
          > Need Keywords for your database? Get the Controlled Vocabulary Solution
          > http://controlledvocabulary.com/products/ support for a dozen of the
          > most popular imaging applications from Adobe Bridge to Photo Mechanic.
          >
        • David Riecks
          ... Rob: It s still not very clear what you are tagging, and within what type of environment. From your comment above, can I assume that I m correct in
          Message 5 of 11 , Jan 25, 2012
            At 01:03 PM 1/25/2012, crobertsonuno wrote:

            Thanks for this info.

            I am hoping to read free form text and then establish appropriate tags based on a controlled vocabulary.

            Rob:

            It's still not very clear "what" you are tagging, and within what type of environment.

            From your comment above, can I assume that I'm correct in thinking that you are wanting to tag some kind of "text" document?

            If so, are these PDF, Word Documents, Text files?

            How are you storing these, and how do you intend to access / discover them?  (local machine, network, cloud?)

            Do you intend to "catalog" into a general or specific database of some sort?  Applications such as Canto Cumulus and even Expression Media can "catalog" PDF and some limited "text" document types. However the means to "embed" any information in the file itself -- information that can travel with the file themselves -- is limited by the specific file format. 

            If these documents are primarily for reading, then I would recommend converting to PDF as this format does allow embedding information, and is supported by a wide variety of databases.

            If you do not have a specific database in mind, and/or are looking at storing them in a manner where others can access (online database?), then it would help to know that.  And any details you can provide would be very useful.

            Some Operating Systems (like the Mac OS) do have features that index the files and can see this form of embedded metadata on a local machine, or connected local area network. On the mac this feature is called Spotlight. 

            My experience is in "tagging" images by constructing strings of keywords. This could be done for various types of text documents as well --using a controlled vocabulary -- but it's a manual process. To see one method of generating keywords that I demonstrated for a client yesterday, have a look at the following short video.... "Using the CVKC Controlled Vocabulary Keyword Catalog as a Keyword Generator with Photo Mechanic and Expression Media"

            http://vimeo.com/35592159

            If that's what you are after, it's possible to use a feature like the Structured Keyword Catalog in Photo Mechanic, or the Keyword Catalog in Image Info Toolkit, and then "paste" those terms into a field in a database or the actual file itself.

            However, in your original post you mention "auto indexing" -- whereas above you talk about "reading free form text and then establishing appropriate tags" so it's not clear if you are looking for a system that will do all the work for you or not.

            There are some systems that have "full text search" but I have to admit that this is outside my area of expertise, so perhaps someone else can step up and fill you in on the options there.

            If that is the case, I'm sure that more information from you regarding what types of files, and what types of systems you are using would be very useful.

            Hope that helps.

            David

            --
            David Riecks  (that's "i" before "e", but the "e" is silent)
            Need Keywords for your database? Get the Controlled Vocabulary Solution
            http://controlledvocabulary.com/products/ support for a dozen of the
            most popular imaging applications from Adobe Bridge to Photo Mechanic.

          • Claude
            ... There are a number of auto-indexing tools out there, but they re typically not free. In addition to the well-known names (Data Harmony s MAI, Concept
            Message 6 of 11 , Jan 28, 2012
              --- In TaxoCoP@yahoogroups.com, "crobertsonuno" <crobertsonuno@...> wrote:
              > I am hoping to read free form text and then establish appropriate tags based on a controlled vocabulary.

              There are a number of auto-indexing tools out there, but they're typically not free.

              In addition to the well-known names (Data Harmony's MAI, Concept Searching, etc., etc.) there is also an interesting startup that capitalizes on the work of the MIT Media Lab on concept networks. They're called Luminoso (www.lumino.so) and have an "Insight Engine" that can explore text and is used for various purposes including sentiment analysis. The interesting novelty is that because they capitalize on a very large corpus of common sense inferences from text (the original research work at MIT from which they spun off), they do NOT need a controlled vocabulary.

              Here's how they describe it: "Luminoso is a text understanding startup with expertise in analyzing unstructured text with uncontrolled vocabularies, capable of devising custom solutions for unique problems. Extracting useful insights and summary tags from survey responses and product reviews are common tasks for this team. They are also startup-friendly and would likely offer a solution acceptable to a small company."

              (The last sentence is something they wrote when I asked them how I should describe them to this forum).

              If anyone wants contact information beside what you will find on their Web site, let me know.
            • Fran Alexander
              The Lumino.so project looks very interesting, but I couldn t figure out how their approach differs from using ontologies, which they say are brittle, complex,
              Message 7 of 11 , Jan 29, 2012
                The Lumino.so project looks very interesting, but I couldn't figure out how their approach differs from using ontologies, which they say are brittle, complex, go out of date, and are prescriptive.

                It looked to me like they use one big general language ontology - their ConceptNet. Do you know what makes ConceptNet different from other ontologies?

                Fran




                ------------------------------
                On Sun, Jan 29, 2012 00:59 GMT Claude wrote:

                >--- In TaxoCoP@yahoogroups.com, "crobertsonuno" <crobertsonuno@...> wrote:
                >> I am hoping to read free form text and then establish appropriate tags based on a controlled vocabulary.
                >
                >There are a number of auto-indexing tools out there, but they're typically not free.
                >
                >In addition to the well-known names (Data Harmony's MAI, Concept Searching, etc., etc.) there is also an interesting startup that capitalizes on the work of the MIT Media Lab on concept networks. They're called Luminoso (www.lumino.so) and have an "Insight Engine" that can explore text and is used for various purposes including sentiment analysis. The interesting novelty is that because they capitalize on a very large corpus of common sense inferences from text (the original research work at MIT from which they spun off), they do NOT need a controlled vocabulary.
                >
                >Here's how they describe it: "Luminoso is a text understanding startup with expertise in analyzing unstructured text with uncontrolled vocabularies, capable of devising custom solutions for unique problems. Extracting useful insights and summary tags from survey responses and product reviews are common tasks for this team. They are also startup-friendly and would likely offer a solution acceptable to a small company."
                >
                >(The last sentence is something they wrote when I asked them how I should describe them to this forum).
                >
                >If anyone wants contact information beside what you will find on their Web site, let me know.
                >
              • Claude
                Fran, Thanks for the question. I don t think I have an authoritative answer, but I ve told them about your remark and I expect them to send me something in
                Message 8 of 11 , Jan 29, 2012
                  Fran,

                  Thanks for the question. I don't think I have an authoritative answer, but I've told them about your remark and I expect them to send me something in reply.

                  Having talked for several years to Prof. Henry Lieberman at the MIT Media Lab, under whom the Concept Net work was done, I know that one characteristic was that the concept network was build from the CommonSense project, in which people would enter on a public Web site statements describing everyday knowledge of the relationship between things. Given my background in the oilfield services industry, we often discussed the fact that people could enter statements like "oil is used for cooking," "oil is used in an engine," and "oil is found underground" and (in combination with many other statements, of which the project accumulated hundreds of thousands if not millions), the system would end up knowing that there were several kinds of things called "oil," and what some of their respective properties were.

                  While this may be interesting background, please note again that this is not Luminoso's own answer to your question, so you should wait for me to relay such an answer before concluding.

                  I'm sure I could also ask them to organize an online presentation if some people on this forum are interested. Let me know...

                  Claude Baudoin
                  cébé IT & Knowledge Management LLC
                  Austin, TX

                  --- In TaxoCoP@yahoogroups.com, Fran Alexander <franalexander32@...> wrote:
                  > The Lumino.so project looks very interesting, but I couldn't figure out how their approach differs from using ontologies, which they say are brittle, complex, go out of date, and are prescriptive.
                  >
                  > It looked to me like they use one big general language ontology - their ConceptNet. Do you know what makes ConceptNet different from other ontologies?
                  >
                  > Fran
                • C. Robertson
                  Thanks, Claude.  Interesting approach.  Will definitely check it out along with all the other info received here.  Thanks, again, David for the image
                  Message 9 of 11 , Jan 29, 2012
                    Thanks, Claude.  Interesting approach.  Will definitely check it out along with all the other info received here.  Thanks, again, David for the image tagging systems info, too.

                    Have been looking at OpenCalais and trying it out on user comments centered around food reviews, recipes, descriptions, but it doesn't quite describe the qualities of foods to the extent that I want.  It'll indicate if it's "fast food" and it picks up types of food, but it doesn't get in to specifics and identify whether the food is comfort food, haute cuisine, etc., and it seems to make quite a few mistakes, which is understandable given the nature of this kind of work, but there isn't a way to refine the tagging logic to disambiguate Buffalo Wings from Buffalo, NY!  But I guess that comes with the open source territory.

                    David, your info about using a database with a "keywords" field then using one of the tools that you mention to generate the tags and paste in the results is sounding like one of the most viable options.   Thanks for the link to the movie--very insightful.

                    Thanks all - Rob

                    From: Claude <cbaudoin@...>
                    To: TaxoCoP@yahoogroups.com
                    Sent: Saturday, January 28, 2012 6:59 PM
                    Subject: Re: {Disarmed} [TaxoCoP] Seeking Info on Auto-Indexing Systems

                     
                    --- In TaxoCoP@yahoogroups.com, "crobertsonuno" <crobertsonuno@...> wrote:
                    > I am hoping to read free form text and then establish appropriate tags based on a controlled vocabulary.

                    There are a number of auto-indexing tools out there, but they're typically not free.

                    In addition to the well-known names (Data Harmony's MAI, Concept Searching, etc., etc.) there is also an interesting startup that capitalizes on the work of the MIT Media Lab on concept networks. They're called Luminoso (www.lumino.so) and have an "Insight Engine" that can explore text and is used for various purposes including sentiment analysis. The interesting novelty is that because they capitalize on a very large corpus of common sense inferences from text (the original research work at MIT from which they spun off), they do NOT need a controlled vocabulary.

                    Here's how they describe it: "Luminoso is a text understanding startup with expertise in analyzing unstructured text with uncontrolled vocabularies, capable of devising custom solutions for unique problems. Extracting useful insights and summary tags from survey responses and product reviews are common tasks for this team. They are also startup-friendly and would likely offer a solution acceptable to a small company."

                    (The last sentence is something they wrote when I asked them how I should describe them to this forum).

                    If anyone wants contact information beside what you will find on their Web site, let me know.



                  • Fran Alexander
                    Thanks Claude. It will be interesting to hear what they say. I have been joking about needing an ontology of everything but having to get someone else to build
                    Message 10 of 11 , Feb 2, 2012
                      Thanks Claude.

                      It will be interesting to hear what they say. I have been joking about needing an ontology of everything but having to get someone else to build and maintain it! It sounds like they are trying to solve the same problem.

                      Fran



                      ------------------------------
                      On Sun, Jan 29, 2012 22:08 GMT Claude wrote:

                      >Fran,
                      >
                      >Thanks for the question. I don't think I have an authoritative answer, but I've told them about your remark and I expect them to send me something in reply.
                      >
                      >Having talked for several years to Prof. Henry Lieberman at the MIT Media Lab, under whom the Concept Net work was done, I know that one characteristic was that the concept network was build from the CommonSense project, in which people would enter on a public Web site statements describing everyday knowledge of the relationship between things. Given my background in the oilfield services industry, we often discussed the fact that people could enter statements like "oil is used for cooking," "oil is used in an engine," and "oil is found underground" and (in combination with many other statements, of which the project accumulated hundreds of thousands if not millions), the system would end up knowing that there were several kinds of things called "oil," and what some of their respective properties were.
                      >
                      >While this may be interesting background, please note again that this is not Luminoso's own answer to your question, so you should wait for me to relay such an answer before concluding.
                      >
                      >I'm sure I could also ask them to organize an online presentation if some people on this forum are interested. Let me know...
                      >
                      >Claude Baudoin
                      >cébé IT & Knowledge Management LLC
                      >Austin, TX
                      >
                      >--- In TaxoCoP@yahoogroups.com, Fran Alexander <franalexander32@...> wrote:
                      >> The Lumino.so project looks very interesting, but I couldn't figure out how their approach differs from using ontologies, which they say are brittle, complex, go out of date, and are prescriptive.
                      >>
                      >> It looked to me like they use one big general language ontology - their ConceptNet. Do you know what makes ConceptNet different from other ontologies?
                      >>
                      >> Fran
                      >
                    • Claude
                      ... Fran et al., I didn t want to put words in their mouths, so I asked, and I got this reply from the Media Lab s Catherine Havasi, who co-founded the
                      Message 11 of 11 , Feb 13, 2012
                        --- In TaxoCoP@yahoogroups.com, Fran Alexander <franalexander32@...> wrote:
                        > The Lumino.so project looks very interesting, but I couldn't figure
                        > out how their approach differs from using ontologies, which they say
                        > are brittle, complex, go out of date, and are prescriptive.
                        >
                        > It looked to me like they use one big general language ontology -
                        > their ConceptNet. Do you know what makes ConceptNet different from
                        > other ontologies?

                        Fran et al.,

                        I didn't want to put words in their mouths, so I asked, and I got this reply from the Media Lab's Catherine Havasi, who co-founded the company:

                        "The key point is the difference between specific and general ontologies -- a topic area ontology suffers from the brittleness and maintainability problems and may also need to be built in the first place. For instance, computer and technology terms evolve rapidly over time and a standard ontological approach requires human updating. ConceptNet focuses on learning general human knowledge which doesn't change very quickly, permitting Luminoso to learn new language by seeing it used, essentially constructing a domain-specific framework from domain text without human intervention."

                        If this still sounds interesting but unclear or unconvincing, then I would advise talking to them. I can give you contact information if you want, I know Catherine and one of the other partners from the time I was working 5 minutes away from the Media Lab and went over there every chance I got (that was in 2004-2007).

                        Claude Baudoin
                      Your message has been successfully submitted and would be delivered to recipients shortly.