Apache Beam pipelines at 100TB+ scale using Apache Spark.

Scale
06/17/2019 - 14:50 to 15:10
Frannz Salon
short talk (20 min)
Advanced

Session abstract: 

At Seznam.cz, we are building a successful search engine, that is used and loved by millions. Selecting the best possible content from the infinite internet, that satisfies our users needs, requires processing of massive data volumes every single day.

This talk will focus on our long-term journey of scaling Apache Beam to handle 100TB+ scale data pipeline with exponential data skew, using Apache Spark runner.