Every week we are introducing new speakers which will be on stage at #bbuzz 2015. Thanks to our program committee we can present part of our new eclectic program. Presentations range from beginner friendly introductions on hot data analysis topics to in-depth technical presentations about scalable architectures. The conference presents more than 50 talks by international speakers specific to the three tags "search", "store" and "scale".
Search – 20min
Beyond significant terms
Andre is a Senior Consultant at Comperio AS specializing in advanced search technology and machine learning. He participated in several national and international research projects in Machine Translation and published several state of the art results in Natural Language Processing.
At #bbuzz he will talk about his work on projects, based on Elasticsearch and its significant terms technology, that aim to learn from documents and social activity published on the web. Andre shows how to expand on the base functionality provided in Elasticsearch to focus on areas such as immediate trends, entity identification and topic building using additional techniques from Information Retrieval (IR) and Natural Language Processing (NLP).
Search - 20 min
Low latency scalable web crawling on Apache Storm
Julien runs DigitalPebble Ltd DigitalPebble, a consultancy based in Bristol, UK, specialising in open source solutions for text engineering. He is a member of the Apache foundation and a committer on Apache Nutch and various other projects.
In his talk he will introduce Storm-Crawler, a collection of resources for building low-latency, large scale web crawlers on Apache Storm. He will compare it with similar projects like Apache Nutch and present several use cases where the storm-crawler is being used. His focus lies on the question how the Storm-crawler can be used with ElasticSearch and Kibana for crawling and indexing web pages.
Find Julien on Twitter
Search - 40 min
A complete Tweet Index on Apache Lucene
Michael Busch is architect in Twitter's Search & Content organization. He designed and implemented Twitter's current search index, which is based on Apache Lucene and optimized for realtime search. Michael is a Lucene committer and Apache member for many years.
His session will present the architecture of the massive search engine Twitter. He will talk on how the search engine was developed, the challenges they faced and how the system scales as more and more tweets are composed.
Michael on Twitter