Loading ...
Sorry, an error occurred while loading the content.

Question from this morning's webcast

Expand Messages
  • Lee Romero
    In regard to the session just completed on, Creating Innovative Enterprise Search Applications to enable Business Productivity . This is a general question
    Message 1 of 7 , Jun 29, 2010
    • 0 Attachment
      In regard to the session just completed on, "Creating Innovative
      Enterprise Search Applications to enable Business Productivity".

      This is a general question for any search engine that claims to do
      entity extraction (which we heard a bit about on the FAST portion of
      this morning's session).

      If the engine performs entity extraction, are the entities that are
      found queryable in any way? Are the entities given any kind of stable
      URI that could be used to link the entities to other linked data you
      might have available (I'm thinking of semantic web technologies).

      If the "same" entity is identified and extracted in multiple documents
      (or even the same document if it were to be re-indexed by the
      indexer), would it be identified as the same thing that was extracted
      previously (i.e., given the same URI as other instances of the
      entity)?

      I'm familiar with OpenCalais ( http://www.opencalais.com/ ) as a
      service that does entity extraction (perhaps it's not quite the same
      thing but it seems very close, anyway), and the idea of a search
      engine doing something similar and providing a way to pull out the
      entities (as OpenCalais does) would be very intriguing.

      The above would be a huge step forward in terms of really connecting
      your content to other information you have available if it's part of
      any search engine's entity extraction functionality, FAST or not.

      Anyone have any idea on this?

      Thanks!
      Lee Romero
    • Jim Wessely
      Entity extraction is not always equally well performed by different vendors, and some merely clain to have the ability even if it is not truly commercially
      Message 2 of 7 , Jun 29, 2010
      • 0 Attachment
        Entity extraction is not always equally well performed by different vendors, and some merely clain to have the ability even if it is not truly commercially usable.  There are also different methods for performing entity extraction to consider.  And be sure to look at the quality of tools that a vendor supplies for customizing the entity identification.  Some vendors only have undocumented command line capabilities while others have robust tools and very nice, useable UIs.

        As for different types of entity identification/extraction, there is "named" entity extraction, which is sort of analogous to looking up terms in a table.  Pattern matching is used a lot, too, commonly using regular expressions along with other methods.  There is also the ability to identify entities via parts of speech disambiguation which can identify phrases (commonly noun phrases) which can often be more useful than single term entities in my experience.

        FYI, Open Calais is built upon the technology developed by Inxight, which was purchased by Business Objects, which is now part of SAP.  SAP/Business Objects licenses this to Clear Forest (now part of SAS, if memory serves), which is the foundation of Open Calais.   This is very, very good entity identification that I have not seen matched by any of the search vendors so far.  But then again, I am not completely up to date with my investigation of search vendors, so I can't say if any of them have developed really good entity extraction capabilities these days.

        Jim

        Lee Romero wrote:
         

        In regard to the session just completed on, "Creating Innovative
        Enterprise Search Applications to enable Business Productivity".

        This is a general question for any search engine that claims to do
        entity extraction (which we heard a bit about on the FAST portion of
        this morning's session).

        If the engine performs entity extraction, are the entities that are
        found queryable in any way? Are the entities given any kind of stable
        URI that could be used to link the entities to other linked data you
        might have available (I'm thinking of semantic web technologies).

        If the "same" entity is identified and extracted in multiple documents
        (or even the same document if it were to be re-indexed by the
        indexer), would it be identified as the same thing that was extracted
        previously (i.e., given the same URI as other instances of the
        entity)?

        I'm familiar with OpenCalais ( http://www.opencalais.com/ ) as a
        service that does entity extraction (perhaps it's not quite the same
        thing but it seems very close, anyway), and the idea of a search
        engine doing something similar and providing a way to pull out the
        entities (as OpenCalais does) would be very intriguing.

        The above would be a huge step forward in terms of really connecting
        your content to other information you have available if it's part of
        any search engine's entity extraction functionality, FAST or not.

        Anyone have any idea on this?

        Thanks!
        Lee Romero



        __________ Information from ESET Smart Security, version of virus signature database 5237 (20100629) __________

        The message was checked by ESET Smart Security.

        http://www.eset.com


        __________ Information from ESET Smart Security, version of virus signature database 5238 (20100629) __________

        The message was checked by ESET Smart Security.

        http://www.eset.com
      • Guy
        Hi Lee, Shame I missed the webinar, sounds like interesting subject matter and I think this was with Nate, yes? If so, all the better. As Jim points out
        Message 3 of 7 , Jun 30, 2010
        • 0 Attachment
          Hi Lee,

          Shame I missed the webinar, sounds like interesting subject matter and I think this was with Nate, yes? If so, all the better.

          As Jim points out capabilities vary between vendors offering entity extraction. Many users of this software choose to separate the annotation of content (using e.g. Nstein, Attensity, Opencalais, Tika) from the indexing and searching of it. FAST, like other search engines, makes use of that markup whether it was generated by them or by those other services and thus enables it all to be queryable.

          Choosing to separate out these functions is effort and cost but does enable the management of the entities in the way you describe to link them. That said, there are numerous smart people at FAST who have certainly looked at this in detail :-)

          BTW I believe it was Reuters (Thomson Reuters) who snapped up Clearforest whereas SAP acquired Inxight.

          Lee, follow @sethgrimes for comprehensive analysis of text analytics.

          Best,
          Guy
        • Jim Wessely
          I think you are right, Guy. I believe it was Thomson Reuters now that you mention it. Was it Teragram that SAS bought, or do I have the acquisitons all
          Message 4 of 7 , Jun 30, 2010
          • 0 Attachment
            I think you are right, Guy.  I believe it was Thomson Reuters now that you mention it.  Was it Teragram that SAS bought, or do I have the acquisitons all confused?  SAP definitely was th eone that acquired Inxight.  I am in Europe working with SAP and their Text Analysis software (was Inxight) this week.

            Sorry I missed the Webinar, too.  Sounds like it was an interesting one.

            Jim

            Guy wrote:
             


            Hi Lee,

            Shame I missed the webinar, sounds like interesting subject matter and I think this was with Nate, yes? If so, all the better.

            As Jim points out capabilities vary between vendors offering entity extraction. Many users of this software choose to separate the annotation of content (using e.g. Nstein, Attensity, Opencalais, Tika) from the indexing and searching of it. FAST, like other search engines, makes use of that markup whether it was generated by them or by those other services and thus enables it all to be queryable.

            Choosing to separate out these functions is effort and cost but does enable the management of the entities in the way you describe to link them. That said, there are numerous smart people at FAST who have certainly looked at this in detail :-)

            BTW I believe it was Reuters (Thomson Reuters) who snapped up Clearforest whereas SAP acquired Inxight.

            Lee, follow @sethgrimes for comprehensive analysis of text analytics.

            Best,
            Guy



            __________ Information from ESET Smart Security, version of virus signature database 5241 (20100630) __________

            The message was checked by ESET Smart Security.

            http://www.eset.com


            __________ Information from ESET Smart Security, version of virus signature database 5241 (20100630) __________

            The message was checked by ESET Smart Security.

            http://www.eset.com
          • Guy
            Hi Jim - I think you are right about Teragram and SAS. If you get chance to get to London give me a shout http://uk.linkedin.com/in/guyvalerio Guy
            Message 5 of 7 , Jul 2 5:45 AM
            • 0 Attachment
              Hi Jim - I think you are right about Teragram and SAS. If you get chance to get to London give me a shout http://uk.linkedin.com/in/guyvalerio
              Guy
            • Lee Romero
              Jim and Guy - Thanks for your replies to my email. The topic I asked about was mostly tangential to the webinar - it happened to be on one of the slides and
              Message 6 of 7 , Jul 6 10:33 AM
              • 0 Attachment
                Jim and Guy - Thanks for your replies to my email.

                The topic I asked about was mostly tangential to the webinar - it
                happened to be on one of the slides and this is a question I've long
                wondered about.

                I know "entity extraction" is a common buzzword for search engines and
                have never understood what any search engine actually does (other than
                use the buzzword, of course).

                Whatever they do, it seems like it would be most useful if you could
                (as an administrator of the search) somehow access the entities that
                are extracted and also depend on their "stability".

                Is there any writing anywhere you could point me to about what any
                particular (search) vendors are doing in this area? If FAST is
                looking at it, is there any description of what they're doing?

                Thanks again!
                Lee

                On Wed, Jun 30, 2010 at 9:12 AM, Guy <guy.valerio@...> wrote:
                >
                > Hi Lee,
                >
                > Shame I missed the webinar, sounds like interesting subject matter and I think this was with Nate, yes? If so, all the better.
                >
                > As Jim points out capabilities vary between vendors offering entity extraction. Many users of this software choose to separate the annotation of content (using e.g. Nstein, Attensity, Opencalais, Tika) from the indexing and searching of it. FAST, like other search engines, makes use of that markup whether it was generated by them or by those other services and thus enables it all to be queryable.
                >
                > Choosing to separate out these functions is effort and cost but does enable the management of the entities in the way you describe to link them. That said, there are numerous smart people at FAST who have certainly looked at this in detail :-)
                >
                > BTW I believe it was Reuters (Thomson Reuters) who snapped up Clearforest whereas SAP acquired Inxight.
                >
                > Lee, follow @sethgrimes for comprehensive analysis of text analytics.
                >
                > Best,
                > Guy
                >
                >
              • Matt Moore
                Hi, It should be noted that entity extraction is not only used to improve search. It allows the automatic linking of content in one resource to another - e.g.
                Message 7 of 7 , Jul 7 6:49 AM
                • 0 Attachment
                  Hi,

                  It should be noted that entity extraction is not only used to improve search. It allows the automatic linking of content in one resource to another - e.g. if a text string in a document is identified as a company, it can be linked to a entry in something like hoovers or dun & bradstreet - there's a little about this in an article I wrote
                  called "cyborg metadata" (I do not have link but it's easy to find on google). It's yer se-man-tik web innit? 

                  Matt Moore

                  On Jul 6, 2010, at 6:33 PM, Lee Romero <pekadad@...> wrote:

                   

                  Jim and Guy - Thanks for your replies to my email.

                  The topic I asked about was mostly tangential to the webinar - it
                  happened to be on one of the slides and this is a question I've long
                  wondered about.

                  I know "entity extraction" is a common buzzword for search engines and
                  have never understood what any search engine actually does (other than
                  use the buzzword, of course).

                  Whatever they do, it seems like it would be most useful if you could
                  (as an administrator of the search) somehow access the entities that
                  are extracted and also depend on their "stability".

                  Is there any writing anywhere you could point me to about what any
                  particular (search) vendors are doing in this area? If FAST is
                  looking at it, is there any description of what they're doing?

                  Thanks again!
                  Lee

                  On Wed, Jun 30, 2010 at 9:12 AM, Guy <guy.valerio@...> wrote:
                  >
                  > Hi Lee,
                  >
                  > Shame I missed the webinar, sounds like interesting subject matter and I think this was with Nate, yes? If so, all the better.
                  >
                  > As Jim points out capabilities vary between vendors offering entity extraction. Many users of this software choose to separate the annotation of content (using e.g. Nstein, Attensity, Opencalais, Tika) from the indexing and searching of it. FAST, like other search engines, makes use of that markup whether it was generated by them or by those other services and thus enables it all to be queryable.
                  >
                  > Choosing to separate out these functions is effort and cost but does enable the management of the entities in the way you describe to link them. That said, there are numerous smart people at FAST who have certainly looked at this in detail :-)
                  >
                  > BTW I believe it was Reuters (Thomson Reuters) who snapped up Clearforest whereas SAP acquired Inxight.
                  >
                  > Lee, follow @sethgrimes for comprehensive analysis of text analytics.
                  >
                  > Best,
                  > Guy
                  >
                  >


                   
                Your message has been successfully submitted and would be delivered to recipients shortly.