Loading ...
Sorry, an error occurred while loading the content.

Re: netflix contest..dinosaur planet

Expand Messages
  • teamdinosaurplanet
    Hi theory-edge, First, thanks for giving us an opportunity to chat with you all. Since there are three of us, we ll try to give a composite answer to your
    Message 1 of 5 , Sep 14, 2007
    View Source
    • 0 Attachment
      Hi theory-edge,

      First, thanks for giving us an opportunity to chat with you all.
      Since there are three of us, we'll try to give a composite answer to
      your questions...

      > a. can you give a general idea of what your algorithm is based on eg
      > SVD etc


      We can't reveal too many secrets right now with the deadline for the
      Progress Prize so close, but we have implemented and tried to
      incorporate all of the popular algorithms you hear mentioned on the
      forums: SVD, KNN, etc. I think it's pretty safe to say that most of
      the leading teams are using many, many different approaches. We try
      pretty much any idea that we can think of or learn about.

      > b. how much time have you spent on it? did you hear the hungarian team
      > (as I recall re NYT article) estimate they have spent 8 hrs per day
      > since the beginning of the contest?

      That has varied for all of us individually. When we first started, we
      were meeting a few nights a week to plan out the basic code just for
      handling the dataset, and then coding different parts individually. As
      we started getting more and more ideas, things ramped up quite a bit.
      The amount of time we spend definitely comes in waves. There have been
      weeks when we would spend 5 or 6 hours every day working on it. There
      have also been weeks (and months) of dry spells where we wouldn't work
      on Netflix at all.

      > c. what is your feeling on competing against the worlds greatest phds
      > on this problem & coming up 2nd?


      David W: It's very cool, but I also think it's important to point out
      that a lot of what we've done would have been impossible without the
      many papers and textbooks published by lots of the other top
      contestants. It's not like the three of us have independently invented
      a super algorithm that beats the best of what has been invented so
      far; rather, we've spent a lot of time reading existing papers, trying
      to implement them, and eventually understanding them well enough to
      tweak and optimize them for our particular implementation.

      David L: Yeah, I agree pretty much with what David Weiss said. A lot
      of the work published by other teams is really great and our solution
      uses a lot of them, with of course several of our own secret ideas.

      Lester: It's also clear that some teams want to focus on a particular
      algorithm or family of algorithms, to demonstrate how they stack up to
      other existing machine learning approaches on this massive data set.
      We have no such single-algorithm loyalty -- we love em all.


      > d. are you guys working right now? have you all graduated?

      We all graduated from Princeton Univ. in June, and now...

      Lester: I've just started as a grad student at UC Berkeley in the EECS
      (Electrical Engineering and Computer Science) department. The contest
      has encouraged me to continue exploring machine learning.

      David W: I'm back at Princeton working as an RA with my neuroscience
      advisor, Ken Norman. (I majored in Computer Science, but with a
      certificate in Neuroscience). I'm applying to grad school now. During
      my morning commute (I live in Philadelphia), I'm also working with my
      older brother on his startup company ( www.medforward.com).

      David L: I just started a job in New York trading interest rate
      derivatives and am pretty excited about that.

      >
      > e. do you have home pages anywhere?

      David W: Not yet...but I'm working on one now.

      Lester: Not yet -- that's a good idea though.

      > f. do you think the netflix contest is winnable? any estimate on when
      > it will be awarded?

      Lester: I definitely think the Grand Prize threshold is attainable --
      I predict that someone will reach it in another year or two.

      David W: I also agree. Every time we think we've hit the ceiling,
      another idea/optimization comes along, and we're back in the race
      again. It could get to be a pretty agonizing crawl before the end,
      though.

      David L: I'll have to differ and say I don't think a 10% improvement
      is attainable, but I still have hope. There is only so much
      information that can be crunched out of this dataset.

      > g. any complaints about the contest?
      >


      David W: Not really -- I'm just surprised that Netflix hasn't been
      more communicative with the contestants.

      David L: I think Netflix has been doing a great job and that the
      contest is extremely well designed and well run. My only complaint is
      with that "added" data they posted (the results from the KDD cup) that
      gives you a few thousand more ratings and the number of ratings each
      movie seen in 2006. Probably running all our algorithms on those
      additional ratings won't help that much, but it's definitely a huge
      pain. I don't think they should have changed the avaliable data after
      the contest started.

      Lester: The contest was very well designed -- my only minor complaint
      was that late release of extra data.

      > h. any advice?

      To do well in the contest, I think you need to read a lot of papers
      and implement anything you come across. The number of ideas we have
      tried that didn't help is pretty ridiculous.

      >
      > i. do you have background in this area? are you students at the top of
      > your class, or average? won awards etc?

      We've taken a few courses in AI, data-mining, etc. but we're all
      pretty new to the machine learning realm.
      We're learning more every day.

      David W: Aside from one or two graduate
      courses, we don't have any particular background. I didn't even know
      the term "collaborative filtering" when we first started working on
      it. However, we have all won some sort of awards at some time or
      another.
    Your message has been successfully submitted and would be delivered to recipients shortly.