Data Streaming is emerging as a new and increasingly popular architectural pattern for the data infrastructure. Data streaming architectures embrace the fact that data in practice never has the form of static data sets, but is continuously produced as streams of events over time. Moving away from centralized “state of the world” databases and warehouses, the applications work directly on the streams of events and on application-specific local state that is an aggregate of the history of events.
Among the many disruptive promises of streaming architectures are
- decreased latency from signal to decision
- a unified way of handling real-time and historic data processing
- time travel queries
- simple versioning of applications and their state (think git update/rollback)
- simplification of data processing stack.
This talk introduces the data streaming architecture paradigm, and shows how to build an exemplary set of simple but representative applications using the open source systems Apache Flink and Apache Kafka. Delivered by the creators of the Apache Flink framework, the talk explains the building blocks of data streaming applications, including
- event stream logs
- transformations and windows
- working with time
- application state and consistency