Every week we are introducing new speakers which will be on stage at #bbuzz 2015. Thanks to our program committee we can present part of our new eclectic program. Presentations range from beginner friendly introductions on hot data analysis topics to in-depth technical presentations about scalable architectures. The conference presents more than 50 talks by international speakers specific to the three tags "search", "store" and "scale".
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
Radu is big data engineer at Atigeo, working on scalable applications in order to make big data matter. He contributed to ambitious open source projects and has been active supporting the big data community for a long time passionately. Most notable he was involved with the HackTm (400 people hackaton).
Ema is a software developer at Atigeo and main commiter of github. She is a passionate engineer, interested in scaling algorithms and implementing statistical models. Momently, she works on big-data analytics in healthcare apps. Moreover, she contributes to and organizes the Timisoara Big Data Meetup.
During the Talk, a live demo of building an in-memory data pipeline and data warehouse from a web console will be given. Amongst other things, Ema and Radu will tell you about architectural guidelines and design patterns meant to help you achieve optimal CPU/Memory for the utmost performance during large scale processing and interactive querying. Topics will be RDDs, shuffle, file consolidation, RDD persistence models (memory, disk, off-heap) serialization, Tachyon. Tips and tricks for maximizing performance and working around the weaknesses of these technologies will be shared.
Cassandra at Yammer
Store - 40 min
Michal is technical Architect/Tech Lead interested in delivering data driven solutions that scale.Since completing the PhD in theoretical computer science in the UK, he worked on delivering software solutions that are scalable and enable organisations to move fast. He was a technical architect for one of the game teams at Playfish, the London based EA game studio and now as a senior core services engineer at Yammer/Microsoft.
Amongst other things, Michal will share insight on the implementing of parts of the Yammer’s messaging pipeline from a custom storage solution backed by Java BDB to Cassandra. His speech will cover the following topics: modeling data and capacity supported by metrics, zero downtime production rollout. It will answer questions of how did things started falling apart after three months of seamless operation to continue with how developers coped with problems that occurred during the process of implementation.