Scale – 40 min
How to gain insights from a fleet of vehicles without breaking a sweat
Michael Hausenblas is Chief Data Engineer EMEA, for MapR, where he helps people tap the potential of Big Data by bridging the technical (reliability, scalability, etc.) and the business side (RoI, TCO, etc.). His background is in large-scale data integration, the Internet of Things, and Web applications and he's experienced in advocacy and standardization (World Wide Web Consortium). Michael is a contributor to Apache Drill.
One of the more mature areas of the Internet of Things (IoT) is the application of sensors in the context of vehicles. Going beyond the mechanical and electrical challenges of deploying the sensors and delivering their readings, we will discuss scalable architectures by means of reviewing three applications (connected car, trucks, and agricultural equipment). From a technological point of view we will be dealing with message queues (Kafka and fluentd), stream processing platforms (Storm and Spark Streaming) as well as time series databases (InfluxDB and OpenTSDB). A live demo from the automotive domain is included in this talk.
Michael on Twitter @mhausenblas
Scale – 40 min
The Do's and Don'ts of Elasticsearch Scalability and Performance
Patrick works at codecentric and is Head of Development of the cloud startup CenterDevice where he builds a highly scalable platform to easily store, find and share data. He has a strong interest in Elasticsearch. Within codecentric, he has built up the Elasticsearch focus area and enjoys doing consulting work for search technologies, speaking and writing about his experience with Elasticsearch.
This talk builds on the combined experience that codecentric has gathered from developing and operating the Elasticsearchn based cloud service CenterDevice as well as various customer projects done over the last two years. It formulates important lessons learned regarding Elasticsearch scalability and performance as easy-to-remember Do's and Don'ts, backed up with anecdotes from actual events. The topics covered range from mapping and query definition over data modeling to cluster configuration and zero downtime re-indexing.
Scale – 40 min
Computing recommendations at extreme scale with Apache Flink
Till is a committer and PMC member of Apache Flink. He is currently working as a developer for dataArtisans, a Berlin based start-up dedicated to improve Apache Flink. His main work focuses on enhancing Flink's scalability as a distributed system and building a large-scale machine learning library with Flink. Till also contributed to Apache Mahout and helps presently to add Flink support to the Mahout DSL.
This talk details our experience with implementing three variants of the ALS (Alternating Least Squares) algorithm to train a latent factor model using the Apache Flink system and scaling them to large clusters and data sets. In order to scale these algorithms to extremely large data sets, Flink’s functionality was significantly enhanced to be able to distribute and process very large records efficiently. A preliminary presentation of the results, scaling ALS to 28 billion user ratings can be found here: http://data-artisans.com/computing-recommendations-with-flink.html
Till on Twitter @stsffap