How do you keep your actor system from falling apart when things go wrong?
The short answer is Supervision.
What Is Supervision, and Why Should You Care?
What Is Supervision?
“Look at this mess you made. Now clean it up and start over!”
Just kidding. You’re doing great. Relax.
But I bet you’ve heard (and probably said) something similar before. That’s similar to supervision in the actor model: a parent monitors its children for errors, and decides how to clean up messes when they happen.
Why Should You Care?
Supervision is the basic concept that allows your actor system to quickly isolate and recover from failures.
Supervision from the top to the bottom of the actor hierarchy ensures that when part of your application encounters an unexpected failure (unhandled exception, network timeout, etc.) those failures will be contained to only the affected part of your actor hierarchy. All other actors will keep on working as though nothing happened. We call this “failure isolation.”
How is this accomplished? Let’s find out…