How Akka.Cluster Achieves Partition Tolerance
Akka.Cluster is one of the most popular and useful parts of the Akka.NET ecosystem as a whole, but it’s also one of the most concept-heavy areas. We have a lot of literature on both the official Akka.NET documentation and elsewhere on our blog about concepts such as distributing state in Akka.Cluster, sharding data across cluster nodes using Akka.Cluster.Sharding, publishing messages across a cluster, and so on; however, that barely scratches the surface on the possibilities and uses of Akka.Cluster.
So my goal with this post is to provide a bit of an FAQ on some of the most important and central concepts needed to build and operate effective Akka.NET clusters.
Node Reachability vs. Membership
In Akka.Cluster there are two important, similar-looking concepts that every end-user should be able to distinguish:
- Node reachability - is this node available right now? Can I connect to it?
- Node availability - is this node a current member of the cluster? Is this node leaving? Joining? Removed?
When many users start working with Akka.Cluster, they operate from the assumption that these two concepts are the same. “If I kill a process that is part of an Akka.NET cluster, that process will no longer be part of the cluster.”
This assumption is incorrect and there’s an important distributed computing concept at work behind this distinction: partition tolerance.
In terms of the CAP theorem, Akka.Cluster provides an AP experience out of the box; Akka.Cluster developers typically trade away some of the cluster’s default availability and partition tolerance (A & P) in exchange for consistency in areas where their domains demand it.
Akka.Cluster’s partition tolerance abilities come from this “reachability” feature; in order to tolerate partitions you have to know where they are and what resources are affected by them...