Fast Data Processing Pipeline for Predicting Flight Delays Using Apache APIS: Kafka, Spark ML, Drill, with MAPR-DB JSON
The possibility to blend machine learning with real-time transactional data flowing through a single platform is opening a world of new possibilities, such as enabling organizations to take advantage of opportunities as they arise. Leveraging these opportunities requires fast, scalable data processing pipelines which process, analyze, and store events as they arrive.
In this deep dive we will look at the architecture of a data pipeline that combines streaming data with machine learning to predict flight delays. You will see the end-to-end process required to build this application using Apache APIs for Kafka, Spark, Drill and other technologies:
- Apache Spark Machine Learning to build a model to predict flight delays.
- Kafka and Spark Streaming: Using the ML model with streaming data to do real-time analysis of flight delays.
- Spark Streaming and fast storage with MapR-DB JSON
- Analysis of Flight delay data and predictions stored in MapR-DB with Apache Spark, Apache Drill and OJAI.
The format will consist of lecture and lab on zeppelin notebooks running on https://katacoda.com/ Zeppelin notebook code will be provided for download so that developers can try out the code on their own after the workshop. Developers can also download a complimentary ebook: https://mapr.com/ebook/getting-started-with-apache-spark-v2/ from MapR which explains the code examples and more.
About the instructor:
Carol is a Solutions Architect at MapR. Previously she was an Apache Spark Instructor and curriculum developer at MapR. Carol has experience working with Java technologies in many roles including software development, training, technology evangelism and developer outreach. She has extensive experience as a software developer and architect, building complex mission-critical applications in the banking, health insurance and telecom industries. Prior to MapR as a Technology Evangelist at Sun, Carol travelled worldwide, speaking, and giving Trainings. Prior to working at MapR: Carol worked as: a Senior developer for a health information exchange, an Architect on a massive OLTP Spring application to manage > 10 mill loans for the consumer credit division of a leading automoblile manufacturer and a leading bank. Carol worked on Pharmaceutical Intranet applications for Roche in Switzerland, a Telecom Network Management Application for HP in France, an Email Server for IBM in Germany, and as a student intern for the National Security Agency. Carol holds a M.S. in Computer Science from the University of Tennessee, a B.S. in Geology from Vanderbilt University, and is a Sun Certified Java Architect and Java Language Programmer. Carol is also Fluent in French and German.
Only 20 seats available.
plista GmbH, Torstraße 35, 5th floor, 10119 Berlin
tube: Rosa-Luxemburg-Platz (U2)
Friday, 14 June 2019
10:00am - 5pm CET
doors open 9:00am
AI.Monday is a networking series that aims to share knowledge of AI and encourage organizations to start their own AI journey.
Since September 2018 #aimonday also takes place in Berlin. This is only natural as a Start-up capital and with 54% of all German AI companies the fourth largest global AI hub. Since then AI Monday happens every 5-6 weeks at changing locations.
We have teamed up with the organisers of AI.Monday Berlin, BerlinPartner, to bring it to Berlin Buzzwords. It takes place at Kesselhaus of Kulturbrauerei on Monday, 17th June, 2019 from 7pm-9pm. Places are free but limited. Berlin Buzzwords passholders need not register, everyone else, please register here
Ahmed Kamal, Machine Learning Platform Lead @ Careem: "Yoda: Scaling Machine Learning @ Careem"
At Careem our platform solves different challenging problems affecting the lives of our users across 120+ cities. Each of these problems requires a local and optimized solution. This emphasizes a strong need for A.I. In this talk, you will be walked through the journey of building our machine learning platform and the challenges addressed while trying to build a scalable, usable and cost-efficient platform that facilitates democratizing Machine Learning usage across different teams.
Calvin Seward, Research Scientist PhD with Focus on Deep Learning @ Zalando: "Security and AI"
Calvin will talk about the risks associated with AI driven algos and the measures he and his team at Zalando have taken to overcome.
Andreas Schindler, CEO & Founder – Deep Neuron Lab
Felix Biessmann, Professor for Machine Learning at Beuth University and the Einstein Center for Digital Future, Berlin: "Data Quality in Machine Learning Production Systems"
Machine learning (ML) algorithms have become a standard technology in production software systems. This imposes new challenges onto the maintainers of software systems featuring ML components. While classical software systems can be tested before being put into production, such testing is difficult for machine learning systems: depending on the data ingested during training or prediction phase the behaviour of a system that learns from data can be different. Thus ensuring robust and reliable functioning of ML systems requires careful monitoring and improvements of various data quality aspects, which can be difficult to automate. This talk summarizes some recent work on leveraging ML technology for automating the measurement and improvement of data quality problems for ML production systems.
Kesselhaus of Kulturbrauerei Berlin, Schönhauser Allee 36, 10435 Berlin
Monday, 17th June 2019
AI2Y.net workshop: KI Schnellbootprojekte (in German)
Das FZI Forschungszentrum für Informatik, die Gesellschaft für Informatik e.V. (GI), das European Center for Information and Communication Technologies (EICT) und Berlin Partner für Wirtschaft und Technologie laden hiermit ganz herzlich alle Interessierten am 18.06.2019 zum Workshop "KI Schnellbootprojekte" des BMWi-geförderten Projekts AI2Ynet ("AI Apply-It-Yourself Network") ein.
Der Workshop findet von 14:30 - 17:00 Uhr (Einlass ab 14:00 Uhr) im Loft im Palais der Kulturbrauerei statt. Die Teilnahme ist kostenfrei. Der Workshop findet in deutscher Sprache statt.
Über AI2Ynet: Mit dem Projekt "AI2Ynet" und der zu entwickelnden Plattform wird der Grundstein für ein Ökosystem zum Transfer von KI-Innovationen gelegt. Ziel ist es, durch branchenübergreifende Vernetzung und Vermittlung von Akteuren und Technologien, kleine und mittelständische Unternehmen bei der Suche und Anwendung passender KI-Technologien und -Komponenten (z.B. Algorithmen, Best Practices/White-Paper, Verfahren d. maschinellen Lernens, Datensätze etc.) zu unterstützen und neue Geschäftsmodelle sowie Verwertungsmöglichkeiten zu erschließen. Schnellboot-Projekte: Im Rahmen des AI2Ynet-Projekts sollen bereits im frühen Stadium kurze, schnelle KI-Projekte gefördert werden. Die Ergebnisse dieser Projekte sollen ganz oder teilweise auf der AI2Ynet-Plattform inseriert werden. So kann frühzeitig eine kritische Masse erzeugt werden, um die Plattform bereits während der Entwicklung immer wieder neu auf den Prüfstand zu stellen.
Ziele des Workshops: Wir möchten mit Ihnen überlegen, unter welchen Rahmenbedingungen solche Schnellboot-Projekte erfolgreich durchgeführt werden können. Diskutieren Sie mit uns, wie ein reibungsloser Bewerbungs- und Vergabeprozess ablaufen könnte. Prägen Sie gemeinsam mit uns faire Kriterien für die Projektvergabe und Spielregeln für die Durchführung. Erstellen Sie mit uns gemeinsam ein tragfähiges Konzept, das den Nutzen der Schnellboot-Projekte maximiert und das Verfahren soweit wie möglich vereinfacht. Wir fangen dabei nicht bei Null an, sondern können bereits auf positive Vorerfahrungen, z.B. aus dem Bereich UX, zurückgreifen. Diese Erfahrungen und Materialien werden wir im Workshop vorstellen und mit Ihnen die Anpassungsfähigkeit diskutieren.
Wir möchten Sie gerne darauf hinweisen, dass wir während der Veranstaltung Fotoaufnahmen von den Teilnehmenden zum Zweck der Öffentlichkeitsarbeit im Rahmen des Projekts "AI2Ynet" aufnehmen werden.
Palais der Kulturbrauerei Berlin, Raum "Loft"
Dienstag, 18. Juni 2019
Uhrzeit: 14:30 - 17:00 Uhr (Einlass ab 14:00 Uhr)
Beam Summit Europe
The Beam Summit is a 2-day, multitrack event with the goal to bring together experts on Beam, new contributors and other participants interested in learning more about Apache Beam. Registration is free.
If you are interested in speaking or hosting a workshop at the Beam Summit Europe, please submit your session at Beam Summit CfP @ Sessionize. The deadline for submissions is March 31st. If you have additional questions, please contact the organizers at firstname.lastname@example.org.
June 19-20, 2019
Data Engineering Meetup #6 - Hosted by Zalando
18:00 - 18:30 Doors open - drinks, and discussions
18:30 Dirk Miethe (SAP) - Innovation with SAP Data Intelligence - from Data Lab to the production of Data Science
19:00 Emily Gorcenski (Thoughtworks) - Continuous Delivery for Machine Learning (CD4ML)
20:00 Alaa Elhadba (Native Instruments) - Data Vision in music production
20:30 Suyash Gark (Zalando) - Nakadi SQL - SQL engine for streaming queries over Nakadi Event Types
21:00 Networking and Get Together
21:45 Event end
Topics and Speaker Info:
"Innovation with SAP Data Intelligence from Data Lab to the production of Data Science"
Speaker: Dirk Miethe (SAP)
"Continuous Delivery for Machine Learning (CD4ML)"
Emily Gorcenski (Thoughtworks)
In this talk we'll explore continuous delivery (CD) for AI/ML along with case studies for applying CD principles to data science workflows, what makes them hard to apply, and what to look for as you integrate CD principles into data science workflows.
"Data Vision in Music Production"
Alaa Elhadba (Native Instruments)
Description: Native Instruments has begun a transformation journey to become a platform catering for musicians, producers, performers, & instrument builders. Our platform strategy requires a strong data culture, knowledge, & tools to turn data into high-quality actionable insights in every unit. This talk will cover the psychology of strategic decision making, building a data culture, how to start working with data, and the analytics vision and architecture at Native Instruments.
"Nakadi SQL - SQL engine for streaming queries over Nakadi Event Types"
Suyash Gark (Zalando)
Description: Nakadi is Zalando's in-house event-bus which is a RESTful abstraction over Apache Kafka. It allows defining entities called Event Types with JSONSchema and validates published events against it, ensuring data quality and consistency for data consumers. Nakadi is fully featured with a JSON based schema registry, access control and even has its own user interface.
To provide with a centralized way for teams across Zalando to easily do stream processing, we developed Nakadi SQL, which is a RESTful interface to create powerful SQL queries on Nakadi Event Types.
Code of Conduct:
Zalando is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age, nationality, cultural background, religion or lack thereof. We do not tolerate harassment of attendees in any form. Offensive and sexual language and imagery is not welcome at our events. Participants violating these rules may be asked to leave at the discretion of the event organisers.
Zeughofstraße 1 (exact location of the entrance 52.501930, 13.435585)
20 June 2019 - 18:00
Elasticsearch and Elastic Stack: Search and Beyond
Elasticsearch is the most widely used full-text search engine, but is also very common for logging, metrics, and analytics. This exercise shows you what the rage is all about:
1. Overview of Elasticsearch and how it became the Elastic Stack.
2. Full-text search deep dive:
- How does full-text search work in general and what are the differences to databases.
- How the score or quality of a search result is calculated.
- How to handle languages, search for terms and phrases, run boolean queries, add suggestions, work with ngrams, and more with Elasticsearch.
3. Going from search to logging, metrics, and analytics:
- System metrics: Keep track of network traffic and system load.
- Application logs: Collect structured logs in a central location from your systems and applications.
- Uptime monitoring: Ping services and actively monitor their availability and response time.
- Application metrics: Get the information from the applications such as nginx, MySQL, or your custom Java applications.
- Request tracing: Trace requests through an application and show how long each call takes and where errors are happening.
And we will do all of that live, since it is so easy and much more interactive that way.
Only 20 places available, so secure your ticket now here
Instructor: Philipp Krenn, Developer advocate @elastic
Philipp lives to demo interesting technology. Having worked as a web, infrastructure, and database engineer for more than ten years, Philipp is now working as a developer advocate at Elastic — the company behind the open source Elastic Stack consisting of Elasticsearch, Kibana, Beats, and Logstash. Based in Vienna, Austria, he is constantly traveling Europe and beyond to speak and discuss about open source software, search, databases, infrastructure, and security.
idealo GmbH, Ritterstr. 11, 10969 Berlin
nearest tube station is Moritzplatz or Kottbusser Tor
19 June 2019, 9am - 5pm, doors open at 8:30 am
MICES - Mix-Camp E-Commerce Search
MICES is a one day event on e-commerce search. The goal of the event is to bring together participants of different backgrounds (IT, product managers, UX designers, search managers, information retrieval specialists, search technology vendors, …) to discuss challenges, ideas, best practices and case studies in the e-commerce search domain.
The format of the event will be a mix of scheduled talks and self-organising sessions.
***The call for talks closes on 14 April 2019.***
myToys. Potsdamer Straße 192, 10783 Berlin
19 June 2019, 9am to 7pm (doors open at 8:30am)
ticket shop satellite events 2019
The satellite events above take place in various locations across Berlin on June 14th and 19th respectively. All events have a limited number of spaces available.
For the workshops, please register in the ticket shop below. You can also save your spot when buying a regular buzzwords ticket here. Just choose your preferred workshop during the ordering process in our Berlin Buzzwords ticket shop.
For MICES, please register on mices.co.
More participants thanks to online event management solutions from XING Events.