Acceptably inaccurate: probabilistic data structures

06/06/2016 - 11:00 to 11:40

Session abstract: 

Writing software for the Internet puts engineers face to face with traffic that makes doing even simple things difficult. Seemingly straightforward operations such as counting, set membership and set cardinality become either extremely slow or prohibitively expensive to do using traditional methods.

Help is at hand, however: probabilistic techniques are both fascinating and extremely useful. By sacrificing a predictable amount of accuracy, we can perform operations at scales we never thought possible, and fast! We'll introduce approaches such as Bloom filters for set membership, count-min sketch for frequency in streams, and HyperLogLog for cardinality. No maths PhD is required.

We'll look at the before and after effects of using these techniques in real-world scenarios, and present the libraries that you can go away and play with right now.