Using Apache Beam to create a unified benchmarking framework for streaming and batch systems

06/12/2017 - 11:00 to 11:40
Moon Lounge
long talk (40 min)

Session abstract: 

In the relational database world, there are different benchmarks to evaluate the correctness and performance of different databases, for example the TPC suites, In this talk we will motivate the need for a benchmark framework to evaluate both stream and batch processing systems. We will introduce Nexmark, a framework to evaluate queries over data streams and discuss its implementation on Apache Beam, and the properties that make Apache Beam the perfect tool to develop a benchmark framework. Nexmark was an integration test donated by Google as part of the Apache Beam incubation process and we have been working to evolve it since. Nexmark not only bridges the gap for evaluating data processing frameworks, but also serves as a rich integration test of the correct implementation of both the Beam runners (for systems like Apache Spark, Apache Flink and Apache Apex), and the new features of the Beam SDK that we will also present.