BM25 demystified

06/07/2016 - 14:30 to 15:10
Frannz Club
long talk (40 min)

Session abstract: 

Lucene will change the default scoring from TF/IDF to BM25 in the next major release. So unless you really enjoy surprises you better learn about it now! TF/IDF was easy enough to understand intuitively but how is it with BM25? What do all these parameters do? And what do people mean when they say it is "probabilistic"? In this talk I will tell the story of how we came from the Probability Ranking Principle to BM25 with a minimum of math and a maximum of explaining. I will also show how BM25 differs from TF/IDF, what it means in practice and give and intuition on what the parameters of this method actually do. You will leave this talk feeling good about Lucene changing the default. And of course you will learn many fancy buzzwords to show off with during the breaks.