Accelerate big data analytics with Apache Kylin

Scale
06/18/2019 - 16:30 to 17:10
Maschinenhaus
long talk (40 min)
Intermediate

Session abstract: 

To achieve high performance or interactive analytics on a big data set, most massively parallel processing (MPP) solutions resort to putting a large proportion of the dataset into memory and launch as many CPU cores as possible to deliver query results in time. By nature, MPP solutions could easily hit throughput bottlenecks at high concurrency or budget issues when datasets grow too large to fit in memory.

Apache Kylin proposed another solution to speed up analytical queries with pre-built OLAP Cube (which is essentially groups of aggregate tables). The Cubes are built with MapReduce/Spark on commodity hardware so that a large volume of datasets can be handled at a reasonable cost.

This talk will have the following detailed topics:

- Apache Kylin background

- Why OLAP Cube is needed for big data

- How Kylin build the Cube on Hadoop

- Performance benchmark

- Use cases