Over the years, we have built many data-analytics systems using components like Apache ZooKeeper, etcd, Consul, or homebrewed implementations of Raft. These components are used in a number of systems to perform what we call distributed coordination: master election, group membership, configuration metadata, locks, barriers, etc. Like many other systems we have seen abused, these components are often used in scenarios where they are convenient, but not strictly necessary. This observation begs the question of where it is necessary to use a distributed consensus primitive.
To understand when it is necessary to rely on a consensus primitive, we need to step back and understand precisely what such a consensus primitive provides and its association to all the problems that we have been using it for. There are many fundamental results in the academic literature that can be used here to explain the need to use a consensus primitive: the relationship between state-machine replication and atomic broadcast, the equivalence between atomic broadcast and consensus, and the equivalence between consensus and leader election. Even further, there is the famous Herlihy consensus hierarchy showing the strength of asynchronous shared memory primitives based on their equivalence to consensus. This hierarchy shows that some useful primitives (e.g., distributed registers) do not need consensus, showing that for many problems we come across when building distributed systems, it is possible to rely on weaker, possibly simpler solutions. Some other primitives, like compare-and-swap, are equivalent to consensus.
The goal of this presentation is to revisit the distributed consensus problem in the light of all these fundamental results, but keeping the discussion away from pure theory. We discuss such primitives and results in a very practical way, using our experience with the Apache ZooKeeper project, and argue that it might be possible to reduce the reliance on such consensus primitives, but consensus is certainly not going away because it is a fundamental problem in distributed computing.