Scale - 40 min
From Machine Learning Startup to Big Data Company
In 2004 Christoph co-founded the web development consultancy némata and left in 2011 to co-found Berlin-based realtime bidding and machine learning company mbr targeting where he focuses on scalability, realtime processing and real-world application of machine learning techniques.
Christoph will share insights on founding a real-time bidding (RTB) company from a two-person startup to a leading technology provider. He will present real world examples of pitfalls, bad technology decisions and other things that can go wrong.
Buzzwords involved: Hadoop, Kafka, Spark, Impala, Redis, Aerospike, …
Search – 40 min
What's with the 1s and 0s? Making sense of binary data at scale
Nick is heavily involved in a number of Apache projects, such as Tika, POI and Chemistry, while having the fortune to know many of the people involved in the Apache Big Data and Search space! When not helping out with Apache things, Nick works as the CTO of Quanticate, a Clinical Research Organization (CRO).
Nick will look what a given blob of 1s and 0s actually is – be it textual or binary – and show how to extract common metadata from it, along with text, embedded resources, images, and maybe even the kitchen sink! Therefore, he introduces Apache Tika, along with some other libraries. He will look at how to roll this all out for a large-scale Search or Big Data setup, helping to turn those 1s and 0s into useful content at scale!
Nick on Twitter
Search - 20 min
Practical t-digest Applications
Ted is PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects and mentor for Apache Storm, DataFu, Kylin, Flink and Calcite projects. He was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He also built fraud detection systems for ID Analytics (LifeLock).
Implementations of t-digest algorithm are easy to use and have been integrated in all kinds of software from ElasticSearch to Apache Mahout. Ted will describe the basic algorithm and demonstrate the effect of some variations of the algorithm.