598Re: [ona-prac] Some basic questions about linkages / algorithms
- Oct 13, 2008Lee,
Who picks the words? Who assigns the words to whom? Who weights each
word for each person?
Do people also nominate each other for who they actually go to for
expertise on A, B or C?
"X may be a high word & high weight person that no one goes to because
X is a jerk."
You can find similarity by attributes [your approach] or links or both.
See my analysis of political books on Amazon [people that bought this
also bought that...]
On Oct 13, 2008, at 1:44 PM, Lee Romero wrote:
> Hi all - (Apologies for the length of this - as I got to describing
> what I'm trying to do, I ended up writing more than I thought I
> I am currently working on an effort to implement a type of "expertise
> location" function based on generating a profile of someone from
> his/her team/project associations and his/her work products (that's a
> pretty simple description for what I'm doing but it gets the point
> Part of this results in a set of keywords associated with each person
> in the population - the "expertise location" can then be thought of as
> providing a standard keyword search engine / interface interface on
> top of the set of keywords associated with the people (there's also
> some navigation in the application based on the keywords but I'm going
> to ignore that for the moment).
> For any one person in the system, the defining characteristics are a
> set of activities / team memberships and, for each of those, there are
> a set of keywords. Collapsing together all of the keywords into a
> "pool" of keywords for a person, you can think of the profile of a
> person as a set of weighted keywords (where the weight for a keyword
> is the number of occurrences of for that keyword within the set of
> activities / team memberships).
> There are no specific restrictions about what is a valid keyword
> except that anything you might think of as a common "stop word" in a
> search engine is excluded (things like "the", "of", etc.). So my
> profile might (in part) look like: ("search", 100), ("knowledge", 40),
> ("management", 80), ("engineering", 20), etc.
> Also, because of how the keywords are generated and weighted, there is
> no upper limit on the weight of a keyword for any one person. To
> follow on from my example, I might have an additional 80 keywords with
> a total weight of, say, 5000, while someone else might have a total of
> 40 keywords and a total weight of 4500.
> All of this is pretty straightforward and, even though the basic idea
> seems pretty simple, the keyword search function across these profiles
> provides a surprisingly high correlation to finding someone with a
> particular skill or expertise. (This makes me happy, as I wasn't sure
> if it would really "work" as expected like this!)
> What I've been considering now is to take these people profiles and
> try to do two additional things which are related: provide a measure
> of "similarity" between two people and, from that measure of
> similarity, try to identify "invisible" communities of interest (by
> identifying pockets of people who have high similarity among
> The idea of a "similarity" metric is intriguing because by itself it
> means that the presentation of a particular person's profile can
> include a means to identify people similar to the one you're looking
> at. Though it's kind of crude to liken this to an ecommerce site, I
> do think of it as similar to the function you see on many sites where
> when you're looking at a product, you are presented with a list of
> similar products. ("People who have found this person interesting
> might find these other people interesting!" :-) )
> My question: Has anyone done something similar to this before? If
> so, what approach have you taken to defining the similarity
> Here's the quandary I've run into, which has prompted my question:
> * To measure similarity between two people (X and Y), I first match
> the keywords between X and Y.
> * For each keyword (KW) the two people have in common, I credit each
> person with the minimum of the weight of KW for X and the weight of KW
> for Y. So if two people have the keyword "engineering" and one has a
> weight of 20 and another has a weight of 60, their similarity for this
> keyword is 20.
> * I then sum up these keyword weights to get a total "similarity
> weight" between the two people. Let's say it's 800 across all of the
> common keywords for two people.
> * Lastly, in order to reflect how much of that commonality describes
> each of the people, I calculate the percentage of someone's profile
> that is "covered" by the similarity weight to get the overall
> "similarity measure". So if X has a total profile weight of 5000 and
> Y has a total profile weight of 4000, that means that the person X is
> 16% (800/5000) similar to Y, while Y is 20% (800/4000) similar to Y.
> This asymmetry makes some sense to me because we are comparing
> "different size" profiles (so person Y can seem more like person X
> then person X might be similar to person Y).
> Now, getting back to my question - I can link people based on either
> of these computations - the "similarity weight" or the "similarity
> measure" - does either make more sense?
> If I use the "similarity weight" then I seem to have an issue where if
> two people both have "heavy" profile weights, they can seem highly
> similar based on their similarity weight even though their similarity
> weight might be a relatively small percentage of their total profile
> weight (say an overlap of 1000 when the weights of the two profiles
> are 8000 and 10000). Similarly, two people who have a small profile
> weight will seem dissimilar even if they had 100% overlap in their
> On the other hand, if I use the "similarity measure", it seems likely
> that anyone with a "heavy" profile weight will seem to have weak links
> because people are likely to have low percentage overlap with them,
> while people with small profiles can seem very similar (and so tightly
> linked) based on just a couple of keywords.
> Any thoughts from ONA practitioners on what might be the best way to
> link people in this situation?
> Sorry for the length - I'm planning to write about this experiment on
> my blog but thought I'd see whether you might have a suggestion for
> how to measure this and how to break the logjam in my head :-)
> Lee Romero
> PS - Yes, I am aware of the perils of ONA via data mining - I do not
> ascribe much to the analysis here other than possibly finding these
> "communities of interest" and not so much about how significant
> someone's position in the network might be or anything like that.
> Yahoo! Groups Links
- << Previous post in topic Next post in topic >>