We built an Elasticsearch Learning to Rank plugin. Then came the hard part.

06/13/2017 - 11:00 to 11:40
long talk (40 min)

Session abstract: 

Learning to Rank uses machine learning to improve the relevance of search results. In this talk, I discuss how we built a learning to rank plugin for Elasticsearch. But what's more interesting is what happened next. Learning to rank requires new ways of thinking about search relevance, and in this talk I go on to discuss the specific problems faced by production-ready learning to rank systems. We learned these hard way so you don't have to. These systems need to solve a variety of problems including:

  • Correctly measuring, using analytics, what a user deems "relevant" or "irrelevant"
  • Hypothesizing which features of users, queries, or documents (or query-user dependent features) might correlate to relevance
  • Logging/Gathering hypothesized features using the search engine
  • Training models in a scalable fashion
  • Selecting and evaluate models for appropriateness and minimal error
  • Integrating models in a live search system alongside business logic, and other non-relevance considerations
  • A/B testing learning to rank models and avoiding future bias of training data

Each of these requires solving pretty tough problems. This talk will discuss our war stories, practical lessons, and the goings-on inside real life search implementations that can help you decide what pitfalls to avoid and decide whether learning to rank is the right direction for your search problem.