Loading ...
Sorry, an error occurred while loading the content.
 

Social Life of Words in search?

Expand Messages
  • Lee Romero
    Hi all - I ve been considering various metrics for search as part of some writing I m doing on my blog and one idea has struck me that I wanted to ask if
    Message 1 of 4 , Feb 22, 2008
      Hi all - I've been considering various metrics for search as part of
      some writing I'm doing on my blog and one idea has struck me that I
      wanted to ask if anyone here has considered or tried or if it seems
      not-so-useful.

      I've recently been looking at the data for a good-sized set of
      searches from our enterprise search tool trying to identify the kinds
      of questions I should be asking that could help me better understand
      what users are doing (or trying to do) and also understand what kinds
      of improvements we need for our search experience. (This is primarily
      what I'm planning my blog writings to be about - what I already can
      understand, what I can't but would like to and also trying to figure
      out what others are looking at in this area.)

      The data set I have has pulled apart the full search terms for each
      individual search into their constituent words - which has been much
      more insightful then looking at the full searches because it removes a
      lot of the variation in the ordering of words or the inclusion of
      additional words.

      While looking at the data, I've got the idea in my head that it could
      possibly be useful to map out the "social network" of the words. The
      germ of the idea is that each word could be considered a node in the
      network and the strength of the tie between words is the number of
      searches in which both words occur together.

      This seems like it might just be the idea of looking for "clustering"
      in another guise.

      However, the visualization of a set of words could make for an
      interesting way to understand the relationship between words
      (especially if using a tool that would make it possible to navigate
      through the set dynamically). Also, the ability to analyze the overall
      linkage (the types of analyses performed on actual social networks)
      could potentially provide insight about how to tweak things like
      synonyms, etc., that might be useful.

      Has anyone ever tried this?

      Any thoughts on the general idea? Crazy? Tried and true (but I'm
      just ignorant on the idea :-) )?

      Some of the challenges I can imagine would include:

      * Volume of data - the # of nodes is very high. Restricting it to a
      subset is necessary, but how to identify the subset?
      * The probability of very dense networks seems high - especially among
      the most common terms. This would seem to make any insights not so
      insightful ("OK - so your top 100 terms are almost all linked together
      - what does that tell you except that those words are commonly
      used???")
      * To deal with both of the above, it'd probably make sense to set some
      kind of minimum threshold for the strength of the link between two
      node (words) before the words are actually considered "linked". That
      could make it hard to understand the data as well.

      Anyway - just thought I'd post to the group to see if anyone has any
      thoughts on this idea...

      Thanks
      Lee Romero
    • Bob Bater
      Lee, Your description of what you want to do sounds a bit like something I wanted to do some years ago, but for different reasons from yours. I used an early
      Message 2 of 4 , Feb 23, 2008

        Lee,

         

        Your description of what you want to do sounds a bit like something I wanted to do some years ago, but for different reasons from yours. I used an early version of an application called TextAnalyst, and it seems to me it might do what you want. Take a look at it here: http://www.megaputer.com/textanalyst.php.

         

        Regards,

         

        Bob Bater

        InfoPlex Associates, UK

        www.infoplex-uk.com

         

         

        From: SearchCoP@yahoogroups.com [mailto:SearchCoP@yahoogroups.com] On Behalf Of Lee Romero
        Sent: 22 February 2008 19:02
        To: searchcop
        Subject: [SearchCoP] Social Life of Words in search?

         

        Hi all - I've been considering various metrics for search as part of
        some writing I'm doing on my blog and one idea has struck me that I
        wanted to ask if anyone here has considered or tried or if it seems
        not-so-useful.

        I've recently been looking at the data for a good-sized set of
        searches from our enterprise search tool trying to identify the kinds
        of questions I should be asking that could help me better understand
        what users are doing (or trying to do) and also understand what kinds
        of improvements we need for our search experience. (This is primarily
        what I'm planning my blog writings to be about - what I already can
        understand, what I can't but would like to and also trying to figure
        out what others are looking at in this area.)

        The data set I have has pulled apart the full search terms for each
        individual search into their constituent words - which has been much
        more insightful then looking at the full searches because it removes a
        lot of the variation in the ordering of words or the inclusion of
        additional words.

        While looking at the data, I've got the idea in my head that it could
        possibly be useful to map out the "social network" of the words. The
        germ of the idea is that each word could be considered a node in the
        network and the strength of the tie between words is the number of
        searches in which both words occur together.

        This seems like it might just be the idea of looking for "clustering"
        in another guise.

        However, the visualization of a set of words could make for an
        interesting way to understand the relationship between words
        (especially if using a tool that would make it possible to navigate
        through the set dynamically). Also, the ability to analyze the overall
        linkage (the types of analyses performed on actual social networks)
        could potentially provide insight about how to tweak things like
        synonyms, etc., that might be useful.

        Has anyone ever tried this?

        Any thoughts on the general idea? Crazy? Tried and true (but I'm
        just ignorant on the idea :-) )?

        Some of the challenges I can imagine would include:

        * Volume of data - the # of nodes is very high. Restricting it to a
        subset is necessary, but how to identify the subset?
        * The probability of very dense networks seems high - especially among
        the most common terms. This would seem to make any insights not so
        insightful ("OK - so your top 100 terms are almost all linked together
        - what does that tell you except that those words are commonly
        used???")
        * To deal with both of the above, it'd probably make sense to set some
        kind of minimum threshold for the strength of the link between two
        node (words) before the words are actually considered "linked". That
        could make it hard to understand the data as well.

        Anyway - just thought I'd post to the group to see if anyone has any
        thoughts on this idea...

        Thanks
        Lee Romero

      • Louis Rosenfeld
        Lee, really interesting posting. I certainly can envision at least drawing links between people through shared individual queries, if not query collections.
        Message 3 of 4 , Feb 25, 2008
          Lee, really interesting posting.  I certainly can envision at least drawing links between people through shared individual queries, if not query collections.  "Other people who also searched for this..." wouldn't likely fly on the open web for privacy reasons, but might be quite effective within a closed, trusted, captive audience context, like an enterprise intranet.

          I'd love to hear more about your site search analytics plans.  Rich WIggins and I have been working on and off on a book on the topic (see here:  http://www.rosenfeldmedia.com/books/searchanalytics/ ), and we're always looking for good examples to feature.  (To those tired of me mentioning this never-quite-finished book:  I swear it'll be written this spring, scout's honor!)  

          You asked what sort of questions to ask your data:  here's a list of generic starter questions, heavily influenced by Avi Rappoport's work:
          1. What are the most frequent unique queries?
          2. Are frequent queries retrieving quality results?
          3. Click-through rates per frequent query?
          4. Most frequently clicked result per query?
          5. Which frequent queries retrieve zero results?
          6. What are the referrer pages for frequent queries?
          7. Which queries retrieve popular documents?
          8. What interesting patterns emerge in general?  
          These are typically enough to get started, and they'll lead to more specific, contextual follow-up questions.  Good luck!

          cheers


          On Fri, Feb 22, 2008 at 2:02 PM, Lee Romero <pekadad@...> wrote:
          >  
          >
          > Hi all - I've been considering various metrics for search as part of
          >  some writing I'm doing on my blog and one idea has struck me that I
          >  wanted to ask if anyone here has considered or tried or if it seems
          >  not-so-useful.
          >  
          >  I've recently been looking at the data for a good-sized set of
          >  searches from our enterprise search tool trying to identify the kinds
          >  of questions I should be asking that could help me better understand
          >  what users are doing (or trying to do) and also understand what kinds
          >  of improvements we need for our search experience. (This is primarily
          >  what I'm planning my blog writings to be about - what I already can
          >  understand, what I can't but would like to and also trying to figure
          >  out what others are looking at in this area.)
          >  
          >  The data set I have has pulled apart the full search terms for each
          >  individual search into their constituent words - which has been much
          >  more insightful then looking at the full searches because it removes a
          >  lot of the variation in the ordering of words or the inclusion of
          >  additional words.
          >  
          >  While looking at the data, I've got the idea in my head that it could
          >  possibly be useful to map out the "social network" of the words. The
          >  germ of the idea is that each word could be considered a node in the
          >  network and the strength of the tie between words is the number of
          >  searches in which both words occur together.
          >  
          >  This seems like it might just be the idea of looking for "clustering"
          >  in another guise.
          >  
          >  However, the visualization of a set of words could make for an
          >  interesting way to understand the relationship between words
          >  (especially if using a tool that would make it possible to navigate
          >  through the set dynamically). Also, the ability to analyze the overall
          >  linkage (the types of analyses performed on actual social networks)
          >  could potentially provide insight about how to tweak things like
          >  synonyms, etc., that might be useful.
          >  
          >  Has anyone ever tried this?
          >  
          >  Any thoughts on the general idea? Crazy? Tried and true (but I'm
          >  just ignorant on the idea :-) )?
          >  
          >  Some of the challenges I can imagine would include:
          >  
          >  * Volume of data - the # of nodes is very high. Restricting it to a
          >  subset is necessary, but how to identify the subset?
          >  * The probability of very dense networks seems high - especially among
          >  the most common terms. This would seem to make any insights not so
          >  insightful ("OK - so your top 100 terms are almost all linked together
          >  - what does that tell you except that those words are commonly
          >  used???")
          >  * To deal with both of the above, it'd probably make sense to set some
          >  kind of minimum threshold for the strength of the link between two
          >  node (words) before the words are actually considered "linked". That
          >  could make it hard to understand the data as well.
          >  
          >  Anyway - just thought I'd post to the group to see if anyone has any
          >  thoughts on this idea...
          >  
          >  Thanks
          >  Lee Romero
          >  



          --
          Louis Rosenfeld :: http://louisrosenfeld.com
          Rosenfeld Media :: http://rosenfeldmedia.com
        • Lee Romero
          Hi Louis - thanks for your comments. I ve got a decent set of analytics to use for search, though, unfortunately, at the moment the engine itself provides
          Message 4 of 4 , Feb 26, 2008
            Hi Louis - thanks for your comments. I've got a decent set of
            analytics to use for search, though, unfortunately, at the moment the
            engine itself provides metrics only on the searches, not on what I'd
            consider the "web analytics" side of things (your numbers 3, 4 and 7).
            We do have a web analytics solution, but it has never been configured
            to capture those particular metrics directly (I might be able to infer
            them with some digging but haven't yet).

            I also like the idea of drawing links between people based on queries,
            though my thoughts on this were more on the connections between the
            queries and the words used within the queries. I'm not sure the
            insight from that would be actionable in any way - just wondering if
            anyone's done any kind of analysis like that in the past to give me a
            clue if it's worthwhile to pursue.

            I've been, um, time challenged in getting my own thoughts together on
            my blog on this topic but hope to have something published yet this
            week.

            Thanks also to Bob for the pointer to TextAnalyst.

            Lee Romero


            On Mon, Feb 25, 2008 at 3:36 PM, Louis Rosenfeld <lou@...> wrote:
            >
            > Lee, really interesting posting. I certainly can envision at least drawing
            > links between people through shared individual queries, if not query
            > collections. "Other people who also searched for this..." wouldn't likely
            > fly on the open web for privacy reasons, but might be quite effective within
            > a closed, trusted, captive audience context, like an enterprise intranet.
            >
            > I'd love to hear more about your site search analytics plans. Rich WIggins
            > and I have been working on and off on a book on the topic (see here:
            > http://www.rosenfeldmedia.com/books/searchanalytics/ ), and we're always
            > looking for good examples to feature. (To those tired of me mentioning this
            > never-quite-finished book: I swear it'll be written this spring, scout's
            > honor!)
            >
            > You asked what sort of questions to ask your data: here's a list of generic
            > starter questions, heavily influenced by Avi Rappoport's work:
            >
            > What are the most frequent unique queries?
            > Are frequent queries retrieving quality results?
            > Click-through rates per frequent query?
            > Most frequently clicked result per query?
            > Which frequent queries retrieve zero results?
            > What are the referrer pages for frequent queries?
            > Which queries retrieve popular documents?
            > What interesting patterns emerge in general? These are typically enough to
            > get started, and they'll lead to more specific, contextual follow-up
            > questions. Good luck!
            >
            > cheers
            >
            >
            >
            >
          Your message has been successfully submitted and would be delivered to recipients shortly.