Every week we are introducing new speakers which will be on stage at #bbuzz 2015. Thanks to our program committee we can present part of our new eclectic program. Presentations range from beginner friendly introductions on hot data analysis topics to in-depth technical presentations about scalable architectures. The conference presents more than 50 talks by international speakers specific to the three tags "search", "store" and "scale".
You've got questions. We've got answers!
Search – 40min
Grant Ingersoll is the CTO and co-founder of Lucidworks as well as an active member of the Lucene community – a Lucene and Solr committer, co-founder of the Apache Mahout machine learning project, and a long standing member of the Apache Software Foundation.
QA and NLP technology have finally hit the mainstream and are making information access easier and more personalized every day. Open source technologies make it easier than ever to build and deploy question answering technology. In this talk, we'll lay the foundation of building a next generation QA system using open source like Apache Solr and various open source NLP libraries as well as demonstrate a working system able to answer real natural language questions.
Detecting Events on the Web with Java, Kafka and ZooKeeper
Scale – 40min
James is the VP Engineering, Product (Backend) at Brandwatch. He works on projects that try to analyse and make sense of large amounts of social media data.
Over the last year at Brandwatch, we have been building, experimenting and scaling a distributed cluster of JVMs that process the data from our client’s queries and detect influential mentions of their brands online and alert on unusual and important trends. We used Apache Kafka, ZooKeeper, and Spring to achieve this. This talk explains the architecture that we built, how it performs, scales and some of the difficulties we’ve faced along the way.
Real Time Big Data Analytics with Kafka, Storm & HBase
Scale – 40min
Ameya Kanitkar is the lead architect building real time analytics infrastructure that powers Groupon’s real time relevance and personalization systems.Before personalization infrastructure, he also lead the design and development of global message bus infrastructure at Groupon.
This talk covers the use case and use of our Kafka-Storm-HBase-Redis pipeline to ingest over 3 million data points per second in real time which in turn brings in millions of dollars in additional revenue. Specially we will discuss how we scaled this system for hundreds of millions of users including solution choices, different techniques and strategies, traditional and innovative approaches. Solution includes some interesting algorithmic choices to reduce data size such as bloom filters and HyperLogLog, as well as use of big data technologies such as HBase, Kafka & Storm.