Online learning has recently caught a lot of attention, following some competitions, and especially after Criteo released 11GB for the training set of a Kaggle contest.
Online learning allows to process massive data as the learner processes data in a sequential way using up a low amount of memory and limited CPU ressources. It is also particularly suited for handling time-evolving date.
Vowpal Wabbit has become quite popular: it is a handy, light and efficient command line tool allowing to do online learning on GB of data, even on a standard laptop with standard memory. After a brief reminder of the online learning principles, we present how to run Vowpal Wabbit on Hadoop in a distributed fashion.