Thursday, March 13, 2008

Welcome

Welcome All! Let me introduce myself. My name is Len Bertoni and I am a member of the team acmehill consisting of myself, my wife, my daughter and my son. We are competing for the netflix prize (http://www.netflixprize.com/) and have recently cracked the top 10.

Team acmehill downloaded the netflix prize data about 1 year ago, but only recently "got serious" about attempting to do anything with it, starting our leaderboard climb in Nov 2007. Our computing power is rather modest, consisting of a 2 year old Emachine (3GHz, 2GB RAM, 160G HD) purchased from Wal-Mart. However, since the contest is as much thinking about why and how people rate movies as it is about applying proven (and not so proven!) mathematical techniques to the provided data, we have not been limited by hardware too much. All programs are written in VB.Net using Visual Basic Express 2005. All code is written "from scratch" and no external mathematical libraries are used.

My family provides me with new ideas to test out while I subject them to countless "fun hours" of delving into matrix factorization (Dad, what is a matrix anyway?), neural nets (does anyone really know how they work anyway?), etc.

I have tried with varying success the following techniques (many do not survive in my final blend as I can not bear to blend 100s of files to produce my final prediction set):

KNN on Movies,
Matrix Factorization,
User/Movie Clustering,
Neural Nets,
Restricted Boltzmann Machines,
Anchoring,
Milestones,
Global Effects Removal

Right now I am implementing KNN on users, which is quite a bit more computer intensive, as the sheer number of users (1/2 million) makes nearest neighbor determination more problematic. Hopefully team acmehill will have some success with this approach.