Learning to Rank for Faceted Search: Bridging the gap between theory and practice

06/13/2017 - 15:20 to 16:00
long talk (40 min)

Session abstract: 

Learning to Rank (LTR) is gaining popularity as a method to personalize and improve the ranking of search results. I will outline how LTR can be used to improve the ranking in a faceted search system, based on an implementation of me and my colleagues - and focus upon some gaps between theory and practice.

The most important ones being that virtually all literature about LTR assumes a direct search approach where both the query and document are represented as a bag of words, and that the machine-learned model often suffices being a simple linear function based on language- and retrieval-model features. In case of faceted search applications, where users search on several different types of dimensions - like brand, location, price, recency etc. - applying this standard approach to LTR will barely go beyond an adjustment of facet field weights.

I will explain how we integrated LTR in Textkernel's faceted search system that matches vacancies and CV's, where queries and documents may have many dimensions of different types. I’ll outline how we use LTR for more than tuning facet field weights by selecting and crafting the right algorithms and features. I’ll also focus upon how to efficiently extract features from queries and documents and apply reranking models with minimal impact on execution times. Finally, I’ll touch upon how a search interface or data from logs can be used to get satisfactory user feedback to keep improving the models.

No past experience of machine learning will be required. Expect to leave this talk with an understanding of how you could integrate reranking in your own system and what pitfalls you might encounter.