The challenges to run spark and flink at scale in a kubernetes cluster. The needed multi tenant environment at a larger scale provide additional challenges on top. The jobs profiles have big range in terms of of size and runtime.
The talk will show same faced problems on flink, spark and kubernetes side, discussed alternatives and used solutions to that.
Topics are around
- deployment, tuning
- runtime variance, congestion
- cluster resilience / update behaviour
- monitoring, logs