Hops: Multi-Tenancy and Streaming-First in an open-source SaaS platform

06/13/2017 - 15:20 to 16:00
long talk (40 min)

Session abstract: 

Hops is a new European version of Hadoop that introduces new concepts to Hadoop to enable multi-tenant Streaming-as-a-Service. In particular, Hops introduces the abstractions: projects, datasets and users. Projects are containers for datasets and users, and are aimed at removing the need for users to manage and launch clusters today, as clusters are currently the only strong mechanims for isolating users and their data from one another.

In this talk we will discuss the challenges in building multi-tenant streaming applications on both Spark and Flink over YARN using Hops concepts. Our platform, called Hopsworks, is in an entirely UI-driven environment built with only open-source software. We also show how we use the ELK stack (Elasticsearch, Logstash, and Kibana) for logging and debugging running Spark streaming applications, how we use Grafana and InfluxDB for monitoring Spark streaming applications, and finally how Apache Zeppelin can provide interactive visualizations and charts to end-users. We will also show how applications are run within a 'project' on a YARN cluster with the novel property that applications are metered and charged to projects. We will also discuss our experiences running streaming-as-a-service on a cluster in Sweden with over 150 users (as of early 2017).