Loading ...
Sorry, an error occurred while loading the content.

29859Re: Ideas for Web analytics problems on clustering

Expand Messages
  • gamerpatrick
    Dec 13, 2012
      Keeping in mind that clustering is a technique associated with data mining, what you're looking to accomplish is going to be muted or restricted in some ways if trying to run this against data already aggregated. You highest level of success will be against data you collect yourself, or that which is collected but not "processed". That's where you'll see the most useful output from your efforts.

      That being said, the "problem" to which grouping/clustering techniques can be applied vary greatly. A lot of what has already been suggest are good starts, but if you want to work on more interesting/exciting concepts, try looking at the meta-game (http://en.wikipedia.org/wiki/Metagaming). What I mean by this is creating clusters as definitions of preceding cluster intersections.

      For example:
      You are analyzing a online retail operation. You look at user segmentation by age and gender to come up with a classification for the broadest of users. You also look at classification by spending patterns (buyers vs. window-shoppers, buyers by lifetime spend, buyers by spend per session, etc.). You start analyzing the relationships between these clusters.

      Now create a new cluster definition (excuse if I'm miss-using terms) based on the intersection of user demographics and spend. Then, in terms of application - start looking at forecasting techniques using this meta-cluster analysis based on new traffic (since you have lifetime spend and estimations of those spends over periodic breakdowns/intervals).

      It's a bit of a rabbit hole, but that's a fun exercise with a relatively low entry point in clustering that still provides some valuable (albeit simple) metrics for businesses.

      - Patrick
      http://pmazzotta.com



      --- In webanalytics@yahoogroups.com, "Michael Wexler" <wexler@...> wrote:
      >
      >
      >
      >
      >
      >
      > Hard to add anything to those smart folks, but I'll try.
      >
      > Segmentation, or clustering of users, is the traditional go-to, and a great place to get started. I'd suggest considering some harder ideas as well, since they don't get as much press:
      >
      >
      > * Visit (Sequence) Clustering: it's much, much harder than just clustering aggregated scores. What makes a visit similar or dissimilar? How can we look at 10K visits and group them into "types": Tire-kicking visits vs. almost-ready-to-buy visits? They may look very similar in aggregate, but the order of pages may differentiate them. (And then ask how quickly you can classify the visit while it's live).
      >
      > ** A variant problem, btw, is to treat each visit as part of a larger shopping journey, and cluster those multi-visit journeys. Useful to understand that some purchases are very complex shopping patterns, and others are in-and-done in one visit. Are the more complex, multi-visit ones more profitable? more costly? You may want to aggregate up metrics around the visit (or use it's cluster assignment) and treat each visit as a "page" then analyze using the same approach you did for the above.
      >
      >
      > * Page Clustering: There are many implicit clustering of pages: for example, all my science fiction books might all cluster as "SF Books Pages". But what about other pages? What role does my "FAQ" page play? Does it cluster with other post-support pages, or is it part of the shopping journey before purchase? What unexpected groupings might I see in the data? Again, same problems: how do you define the distance/similarity metric?
      >
      > For all of these, btw, clustering is one approach to throw at them, but there are a variety of classification approaches, esp. for streaming data, that are more advanced than the traditional KMeans and other clustering approaches. Of course, you many not need the more advanced stuff in many cases, but good to know it's out there. Also, we'd usually want to have a business outcome and recommendation: what change would we make to the site or experience based on these cluster findings? If we don't know, then the exercise may not be the best use of an analysts limited time.
      >
      > Michael
      >
    • Show all 9 messages in this topic