As the world migrates to Kubernetes on the cloud, companies have also been migrating their data processing jobs. The big advantage of the cloud and Kubernetes is that you can add and remove resources as needed. This also allow different strategies of handle the problems. However, there are challenges involved with stateful and long-running applications in cluster managers. This talk will discuss how to leverage tools to run big data jobs, such as Spark, Flink, and HDFS, in an on-demand way on top of Kubernetes. We also look at possible pitfalls, and ways of addressing them.
Topics will include:
- different opportunities in the cloud
- What tools can be used to help with deployment and management
- Cluster auto-scaling
- Deployment of big data jobs
- Tuning and optimization on big data and kubernetes side