Loading ...
Sorry, an error occurred while loading the content.

Re: [ona-prac] Some basic questions about linkages / algorithms

Expand Messages
  • Lee Romero
    Hi Valdis - for the most part, the words are identified through the person s activities (so, largely, by the person him/herself). Some examples: I am a member
    Message 1 of 5 , Oct 13, 2008
    • 0 Attachment
      Hi Valdis - for the most part, the words are identified through the
      person's activities (so, largely, by the person him/herself).

      Some examples:

      I am a member of a mailing list named "software-development-community"
      and I have posted 20 times to that mailing list in the last year on
      various topics (different subject lines).

      I also may have posted to my blog a number of items with various
      titles and slotted into a set of categories (which I have assigned as
      authors).

      Lastly, I have edited, say, 2 dozen different pages in our wiki site
      with various titles and assigned to various keywords.

      Based on the above, my profile would include keywords generated from:

      1. The names of the mailing lists of which I'm a member with a weight
      equal to the # of posts I've made (so each of "software",
      "development" and "community" are weighted 20 based on 20 posts).

      2. The words from the subject lines of those 20 emails I've posted to
      the software-development-community mailing list, with a weight of 1
      for each occurrence of any given word. So, say, I have written 5 of
      those posts about Eclipse (and the word "Eclipse" occurs in the
      subject line of all five of those) in that span of time, "Eclipse"
      would get the weight of 5 from my mailing list posts.

      3. The titles and categories for my blog posts are also broken into
      keywords and assigned a weight of 1 for each occurrence of any given
      word.

      4. The titles and categories of each wiki page are broken down into
      keywords and assigned a weight equal to the number of times I've
      edited any given page (so if I edit a particular page 10 times, each
      keyword from the title or category is assigned a weight of 10 for that
      page).

      And so on. I say that the keywords are self identified "for the most
      part" because I actually probably don't have a lot of control over the
      actual names of the projects or teams on which I work - my hope as an
      implementor of this idea is that the names will generally be
      rationally chosen to reflect words someone might think of to find
      someone else using.

      I've currently included about a dozen sources of activities or project
      / team memberships in this work - which is not yet sufficient for all
      people who might be included but I think it's sufficient to at least
      validate this approach to doing the "expertise location" that
      originally drove the work.

      So if you write a blog post or send an email or edit a wiki page with
      the word "jerk" in it, that keyword ends up associated with you.
      Because the sources for this are internal (corporate), I generally
      don't think that will be that much of a problem.

      The idea is to try to identify keywords that are relevant to someone
      using a means that directly looks at what you work on or write about
      or are assigned to, etc. Hopefully, it can reduce the need to
      maintain some definition of what my own skills are in some system that
      I (or my manager or perhaps my co-workers) have to or need to update.
      Basically, I'm "tagging" myself indirectly through my work. (I think
      that if you had a system that directly stored skills of workers -
      i.e., a "skills inventory database" - that could be treated as nothing
      more than an additional data source for this - probably one with a
      higher weighting than other sources, obviously).

      Does that answer the question?

      Thanks for the pointers, too, Valdis.

      Regards
      Lee

      On Mon, Oct 13, 2008 at 2:15 PM, Valdis Krebs <valdis@...> wrote:
      > Lee,
      >
      > Who picks the words? Who assigns the words to whom? Who weights each
      > word for each person?
      >
      > Do people also nominate each other for who they actually go to for
      > expertise on A, B or C?
      >
      > "X may be a high word & high weight person that no one goes to because
      > X is a jerk."
      >
      > You can find similarity by attributes [your approach] or links or both.
      >
      > See my analysis of political books on Amazon [people that bought this
      > also bought that...]
      >
      > http://www.orgnet.com/divided.html
      >
      > Valdis
      >
      >
      >
    • Charles Armstrong
      hallo lee what you describe is very close to what trampoline s sonar technology does. sonar server gobbles up emails, ldap, documents and so forth ( work
      Message 2 of 5 , Oct 15, 2008
      • 0 Attachment
        hallo lee

        what you describe is very close to what trampoline's sonar technology does. sonar server gobbles up emails, ldap, documents and so forth ("work products" as you term it), deduces each person's expertise and knowledge through statistical language modeling, then calculates the network characteristics for each person using ona techniques. armed with this intelligence sonar can start to identify documents and contacts which are likely to be relevant to a particular user.

        the specific use case you mention of identifying emergent communities of interest is something we've encountered a fair amount of demand for with our flightdeck product. basically it involves identifying people who share a strong interest in a particular field (though their interests may diverge in other areas) but have no identifiable communication with each other. adding a time element to this, up-weighting areas of interest that have only started being identified recently, helps highlight fast-growing trends.

        i can't answer your specific question about similarity matching as i'm just a humble ethnographer. if you're interested i'd be happy to link you up with someone in the team who knows more about the statistical aspects.

        yours : charles


        chief executive // trampoline systems ltd
        the trampery, 8-15 dereham place, london EC2A 3HJ
        uk cell +44 7792 456807
        usa cell +1 415 728 8656

        On 13 Oct 2008, at 18:44, Lee Romero wrote:

        Hi all - (Apologies for the length of this - as I got to describing
        what I'm trying to do, I ended up writing more than I thought I
        would....)

        I am currently working on an effort to implement a type of "expertise
        location" function based on generating a profile of someone from
        his/her team/project associations and his/her work products (that's a
        pretty simple description for what I'm doing but it gets the point
        across).

        Part of this results in a set of keywords associated with each person
        in the population - the "expertise location" can then be thought of as
        providing a standard keyword search engine / interface interface on
        top of the set of keywords associated with the people (there's also
        some navigation in the application based on the keywords but I'm going
        to ignore that for the moment).

        For any one person in the system, the defining characteristics are a
        set of activities / team memberships and, for each of those, there are
        a set of keywords. Collapsing together all of the keywords into a
        "pool" of keywords for a person, you can think of the profile of a
        person as a set of weighted keywords (where the weight for a keyword
        is the number of occurrences of for that keyword within the set of
        activities / team memberships) .

        There are no specific restrictions about what is a valid keyword
        except that anything you might think of as a common "stop word" in a
        search engine is excluded (things like "the", "of", etc.). So my
        profile might (in part) look like: ("search", 100), ("knowledge" , 40),
        ("management" , 80), ("engineering" , 20), etc.

        Also, because of how the keywords are generated and weighted, there is
        no upper limit on the weight of a keyword for any one person. To
        follow on from my example, I might have an additional 80 keywords with
        a total weight of, say, 5000, while someone else might have a total of
        40 keywords and a total weight of 4500.

        All of this is pretty straightforward and, even though the basic idea
        seems pretty simple, the keyword search function across these profiles
        provides a surprisingly high correlation to finding someone with a
        particular skill or expertise. (This makes me happy, as I wasn't sure
        if it would really "work" as expected like this!)

        What I've been considering now is to take these people profiles and
        try to do two additional things which are related: provide a measure
        of "similarity" between two people and, from that measure of
        similarity, try to identify "invisible" communities of interest (by
        identifying pockets of people who have high similarity among
        themselves).

        The idea of a "similarity" metric is intriguing because by itself it
        means that the presentation of a particular person's profile can
        include a means to identify people similar to the one you're looking
        at. Though it's kind of crude to liken this to an ecommerce site, I
        do think of it as similar to the function you see on many sites where
        when you're looking at a product, you are presented with a list of
        similar products. ("People who have found this person interesting
        might find these other people interesting! " :-) )

        My question: Has anyone done something similar to this before? If
        so, what approach have you taken to defining the similarity
        measurement?

        Here's the quandary I've run into, which has prompted my question:

        * To measure similarity between two people (X and Y), I first match
        the keywords between X and Y.

        * For each keyword (KW) the two people have in common, I credit each
        person with the minimum of the weight of KW for X and the weight of KW
        for Y. So if two people have the keyword "engineering" and one has a
        weight of 20 and another has a weight of 60, their similarity for this
        keyword is 20.

        * I then sum up these keyword weights to get a total "similarity
        weight" between the two people. Let's say it's 800 across all of the
        common keywords for two people.

        * Lastly, in order to reflect how much of that commonality describes
        each of the people, I calculate the percentage of someone's profile
        that is "covered" by the similarity weight to get the overall
        "similarity measure". So if X has a total profile weight of 5000 and
        Y has a total profile weight of 4000, that means that the person X is
        16% (800/5000) similar to Y, while Y is 20% (800/4000) similar to Y.
        This asymmetry makes some sense to me because we are comparing
        "different size" profiles (so person Y can seem more like person X
        then person X might be similar to person Y).

        Now, getting back to my question - I can link people based on either
        of these computations - the "similarity weight" or the "similarity
        measure" - does either make more sense?

        If I use the "similarity weight" then I seem to have an issue where if
        two people both have "heavy" profile weights, they can seem highly
        similar based on their similarity weight even though their similarity
        weight might be a relatively small percentage of their total profile
        weight (say an overlap of 1000 when the weights of the two profiles
        are 8000 and 10000). Similarly, two people who have a small profile
        weight will seem dissimilar even if they had 100% overlap in their
        profiles!

        On the other hand, if I use the "similarity measure", it seems likely
        that anyone with a "heavy" profile weight will seem to have weak links
        because people are likely to have low percentage overlap with them,
        while people with small profiles can seem very similar (and so tightly
        linked) based on just a couple of keywords.

        Any thoughts from ONA practitioners on what might be the best way to
        link people in this situation?

        Sorry for the length - I'm planning to write about this experiment on
        my blog but thought I'd see whether you might have a suggestion for
        how to measure this and how to break the logjam in my head :-)

        Regards
        Lee Romero

        PS - Yes, I am aware of the perils of ONA via data mining - I do not
        ascribe much to the analysis here other than possibly finding these
        "communities of interest" and not so much about how significant
        someone's position in the network might be or anything like that.


      • Lee Romero
        Thanks, Charles! That brings a smile to my face. I figured my own research musings (which is what this is so far) could not have been all that original
        Message 3 of 5 , Oct 15, 2008
        • 0 Attachment
          Thanks, Charles! That brings a smile to my face.

          I figured my own research musings (which is what this is so far) could
          not have been all that original (though I'd like to think I have some
          original ideas in the mix here). It sounds like your sonar technology
          may validate my own hypothesis, though.

          I haven't checked your site yet - is there information available about
          this tool (product?) that I could read through?

          If I have any other specific questions, I'll take them offlist as well.

          Regards
          Lee Romero



          On Wed, Oct 15, 2008 at 3:32 PM, Charles Armstrong
          <charles@...> wrote:
          > hallo lee
          > what you describe is very close to what trampoline's sonar technology does.
          > sonar server gobbles up emails, ldap, documents and so forth ("work
          > products" as you term it), deduces each person's expertise and
          > knowledge through statistical language modeling, then calculates the
          > network characteristics for each person using ona techniques. armed with
          > this intelligence sonar can start to identify documents and contacts which
          > are likely to be relevant to a particular user.
          > the specific use case you mention of identifying emergent communities of
          > interest is something we've encountered a fair amount of demand for with our
          > flightdeck product. basically it involves identifying people who share a
          > strong interest in a particular field (though their interests may diverge in
          > other areas) but have no identifiable communication with each other. adding
          > a time element to this, up-weighting areas of interest that have only
          > started being identified recently, helps highlight fast-growing trends.
          > i can't answer your specific question about similarity matching as i'm just
          > a humble ethnographer. if you're interested i'd be happy to link you up with
          > someone in the team who knows more about the statistical aspects.
          > yours : charles
          >
          > chief executive // trampoline systems ltd
          > the trampery, 8-15 dereham place, london EC2A 3HJ
          > uk cell +44 7792 456807
          > usa cell +1 415 728 8656
          > http://trampolinesystems.com
          >
        Your message has been successfully submitted and would be delivered to recipients shortly.