Streaming your shared ride

06/17/2019 - 15:20 to 16:00
long talk (40 min)

Session abstract: 

Fast data processing is essential for making Lyft rides a good experience for passengers and drivers. Our systems need to track and react to event streams in real-time, to update locations, compute routes and estimates, balance prices and more. These use cases are powered by our streaming platform that is based on Apache Flink.

Enablement of data science and machine learning friendly development tooling is a key requirement for our users. Learn how we enable streaming SQL for feature generation and development with Python via Apache Beam to provide the development framework most suitable for the use case on top of a robust deployment stack.

Topics covered in this talk include:

  • Overview of use cases and platform architecture
  • Streaming source and event storage with Apache Kafka and S3; why both are needed for replay, backfill, bootstrapping
  • Stateful streaming computation with scalability, high availability and low latency processing on Apache Flink
  • Development frameworks for varying abstraction levels and the language to use case fit for Java, SQL and Python
  • Python with Apache Beam as the bridge from data science and machine learning friendly environment to distributed execution on Flink
  • Kubernetes based deployment to abstract infrastructure and simplify operations of stateful Flink applications