The Business Case for Actors and Akka.NET

From the 1980s to Present Day

Akka.NET is a .NET implementation of the actor model.

The actor model is an old technology, originating in 1973 as an approach to parallel computing at a time when it looked like the computers of the future might be constructed using thousands of small, low-powered CPUs. History didn’t turn out that way thanks to Moore’s Law; CPUs became faster and faster and modern machines were developed with a small number of very high-powered CPUs.

Despite that, the actor model is immensely popular and runs some of the world’s most important software today. Amazon’s SimpleDb, RabbitMQ, Riak, CouchBase, Goldman Sachs, Motorola, Blizzard Games, Cisco, eBay, Credit Suisse, AMN Healthcare, Bank of America, McGraw Hill Financial, and scores of other major organizations use implementations of the actor model to power mission-critical applications responsible for the world’s largest companies.

So why is the actor model so popular today? Why are so many businesses using it for mission-critical applications?

First Adopters of the Actor Model: Telecoms

The truth of the matter is, the actor model has been popular for a long time through the Erlang programming language. Erlang was the first large-scale, production usable implementation of the actor model - developed originally by Joe Armstrong as a proprietary language at Ericsson in 1986 (open sourced later) to build telephone exchanges. Today it’s used to power the GPRS, 3G, and LTE cellular networks that depend Ericsson’s products.

Although the actor model was originally developed as a means for running applications on types of computer hardware that never really took off, the emergence of electronic computer networks in the late 70s and early 80s gave the actor model an extremely viable commercial application: distributed and concurrent systems.

As the Internet grew and more and more devices became interconnected, developers demanded a programming model to sanely, safely, and speedily manage all of the inputs and outputs flowing into their applications.

The telecoms of the 1980s were the first to realize that without such a model it would be impossible to build cellular networks, messaging systems, media delivery systems, and lots of other modern conveniences whose absence would be inconceivable in the 21st century.

But as the Internet became faster, more available, and connected to an increasingly diverse array of markets and devices the same business challenge that the telecom giants of the 1980s faced first reared its head in other markets such as finance, gaming, ecommerce, and more.

The Common Problem: Processing Everything Now

The actor model is used across a huge variety of industries today and it’s used to solve the same technical and business problem across all of those industries: processing everything now.

Let’s consider the question of speed first:

How effective would a cellular network be if there was a 30 second delay between transmitting and receiving a voice packet?
How effective would a trading platform be if trades couldn’t be successfully processed until the next business day?
How effective would a heart rate monitoring system be if doctors weren’t notified about a patient flat-lining until hours later?
How effective would an online advertising platform be if it couldn’t serve ads before the user scrolled past them?

Although there are some variations in the speed requirements between these different scenarios, they all have the same constraint: the value of each service is causally related to that service’s turn-around time. Heart rate monitors that report issues within seconds will be orders of magnitude more valuable than ones that take hours.

It’s not difficult for companies to build software products that can meet these time constraints for a single user or a single request - processing a single trade or monitoring a single patient in real-time is easy.

The second part of the challenge is the everything part - what happens when you have to process an unpredictably large and growing workload in real-time?

Most current software systems have tremendous difficulty scaling largely because the obvious ways to scale are wrong and aren’t resilient to inevitable network failures and constraints. So if our heartbeat monitoring system begins to start timing out once more than 1,000 patients are being monitored the business is going to have a difficult time onboarding a new customer who wants to add another 10,000 patients to the system.

This is exact scenario that has driven the adoption of the actor model in industry.

Everyone Has This Problem

The telecom industries were affected by the “everything now” problem first; then came the big financial houses of the world; then came the Internet advertising giants; then came the gaming platforms; then came the e-commerce shops; and now industries like healthcare, content, business intelligence, and even home automation are all affected by the issue of “real-time, at scale, right now.”

Your business is no different - speed kills. If you can’t deliver a consistent 5-star experience or if you can’t make your service available 24/7, you won’t be in business for long. Users expect and accept nothing less.

And this is why the actor model has had a sudden surge in adoption over the past 5-10 years - as the Internet and opportunities to acquire new business expand and as competitors get faster and faster all software developers have been tasked with providing experiences to users that are consistently fast, always available, and functionally correct at any scale.

This is where the actor model is without peer - it offers a programmatic model that solves the problem of scale-on-demand eloquently, easily, and effectively for engineering teams. And you don’t need a PhD in computer science to use it.

Simply stated: the actor model creates a programming environment that makes it possible to design systems that can respond scale linearly to demand - if one server can handle two million requests per second then three servers should be able to handle six million requests per second.

Let’s explore why.

What the Actor Model Guarantees

First we must explore what the actor model does.

There are lots of actor model implementations available today and most of those implementations all guarantee the following:

Actors are a fundamental unit of computation

Petabridge has previously dealt with the question of “what is an Akka.NET actor?”, but we’ll address it more concisely here.

Actors are a fundamental unit of computation in the actor model - they are isolated and independent microprocesses.

In the example above, we have two separate actors - neither actor does any work if they don’t have any messages to process. But once an actor does receive a message it is scheduled onto the CPU and performs work and the output of that work might include a message sent to another actor, which receives the message repeats the process of waking up and performing work.

At no point do either of these actors:

Share any state;
Introduce any side effects to each other in the event of a crash or failure;
Wait for each other in order to process their messages; and
Do any work or use any CPU resources if there are no messages to process.

Actors behave analogously to operating system processes but on a much smaller scale - they have their own protected memory and can only communicate with other processes through message-passing. Actors are microprocesses.

Multiple actors can run concurrently

Extending our “actors as microprocesses” analogy - just like how multiple OS processes can all execute concurrently on multiple cores built into a laptop or server, multiple actors can execute concurrently across multiple cores.

Actors with messages to process scheduling across multiple CPUs

When actors receive a message, they are scheduled for execution on a thread and the underlying runtime will execute the actor’s message-handling routines just like anything else. And if multiple actors receive messages simultaneously, they’ll all be scheduled in a work queue to share CPU time - just like any other type of thread in a modern operating system.

Once an actor finishes processes one or more messages (usually it’s more efficient for an actor to process a finite number of messages while they’re running on the CPU, before going to the back of the work queue again) they won’t run again unless they have additional messages that need processing.

It’s not uncommon for actor systems to have millions of actors running on a single server that has a small number of cores.

Actors communicate by passing messages to each other asynchronously

Much like how many forms of inter-process communication are facilitated using message passing, this is also how actors communicate. Unlike traditional object oriented forms of programming, where one actor invoke methods on another, in the actor model actors have handles or references to each other - and through these references actors are able to send messages.

The messages might just be passed in-memory for actors that reside locally on the same machine, or they may be serialized and delivered through a socket-based transport in the event of actors running remotely on other servers. That implementation detail is transparent to the developer through a property known as “location transparency.”

And all message passing is done asynchronously - in the following Akka.NET code snippet:

IActorRef actor = MyActorSystem.ActorOf(Props.Create(() => new MyActorClass()));
actor.Tell("any object can be a message");

The Tell method doesn’t wait for the destination actor to process the message before the method returns. The method waits until that actor’s mailbox queues the message at the tail of the message queue and then immediately returns, thereby making the act of sending a message asynchronous and decoupled from the act of processing a message. All actor implementations are designed with this asynchronous behavior.

Actors can create other actors

In all actor model implementations it’s possible for one actor to spawn a finite number of new actors - this can be done at actor startup, upon receiving a message, or any other scenario.

In Erlang, Akka, and Akka.NET these actors spawned by other actors are organized into a hierarchy or “family tree” structure, such as the one below:

The children were spawned by the parents, the grand children by the children, and so forth.

Actors can change the behavior used to process the next message at runtime

The last major trait of actors is that they can switch their behavior at run-time. This feature is mostly used in stateful processing scenarios, where developers implement Finite State Machines and other stateful processors.

Let’s suppose we’re dealing with the matter of authentication in a chat application - you might have the following three states for an actor representing a single user participating in the chat:

Unauthenticated;
Authenticating; and
Authenticated.

In all three states, the behavior for processing a single ChatMessage might be different:

Unauthenticated - discard the ChatMessage;
Authenticating - buffer the ChatMessage and wait until the results of the authentication action;
Authenticated - publish the ChatMessage to the chatroom.

Behavior-switching allows actors to change the behavior for processing the same type of message depending on the actor’s state.

Akka.NET Guarantees

This paper focuses on the Akka.NET implementation of the actor model specifically, and it’s important to note some of the additional guarantees that Akka.NET (and Akka and Erlang) makes to its end users.

Actors process messages in the order in which they are received

Akka.NET strongly guarantees that if you use any of the default mailbox implementations then the order in which your actor will process messages will match the order in which the mailbox received those messages - First-in, First-out (FIFO) order.

There’s some custom mailbox implementations, such as the Priority Mailbox which uses a priority queue to move high priority messages to the front of the receiving order.

All actors process one message at a time

With the exception of special built-in router actors in Akka.NET, all actors process exactly one message at a time.

This guarantee ensures that any changes made to an actor’s internal state are guaranteed to be atomic -there can never be simultaneous mutations of an actor’s state, therefore any action performed against that state during the processing of a message is inherently thread-safe.

All message classes must be immutable

This constraint falls onto the shoulder of the end-user, but here’s what it guarantees: if you distribute the same in-memory copy of a message to a thousand actors within the same process then each actor can make modifications to the message without creating any side effects for any other actors.

Parent actors supervise their children

In all production scenarios of Akka.NET end-users end up with a hierarchy of actors that are used to model various contexts and domains within their application. In each of these areas, if a child actor fails (throws an unhandled exception) then the parent actor will decide how to respond.

And a parent can respond to a child’s failure by doing any of the following:

Restarting the failed child back into its initial state;
Resuming the child, which effectively “discards” the exception without restarting;
Terminating the child, which stops it; and
Escalating the exception to the grandparent, which can cause a larger rolling restart.

This supervision model makes the handling of exceptions within a very large or very distributed application simple, as all decision making about the handling of errors can be determined granularly at a local level which makes for more reliable behavior.

All actors have a well-defined lifecycle

The last Akka.NET-specific guarantee is that all actors follow the same lifecycle when they are starting, restarting, or shutting down.

Each of these lifecycle methods provides an opportunity to perform initialization, cleanup, and any special behavior needed specifically during restarts.

How Akka.NET Solves the “Process Everything Now” Problem

So how does Akka.NET and the actor models’ capabilities help solve this critical “process everything now” problem faced by an increasingly large number of software companies?

Akka.NET actors eliminate the need to do error-prone, “shared state” concurrent programming

The most important facet of actors is that because actors are:

Isolated - one actor crashing does not affect any of its siblings;
Private - zero state is shared amongst actor instants;
Serial processors - actors only process one message at a time; and
Share information via immutable messages - no side effects from message-passing.

Then all operations executed via actors, no matter how many of them are running concurrently, are inherently thread-safe.

Compare this to traditional shared state concurrent programming which relies on synchronization mechanisms such as locks and semaphores to protect shared memory. Even for developers who are highly experienced in these areas it’s still quite easy to run into race conditions, deadlocks, live locks, unsafe state, and data corruption in that environment.

Akka.NET eliminates most of those categories of problems entirely due to the isolation and message-processing guarantees of its actors.

Akka.NET actors are inherently cloneable, which decreases code footprint and increases throughput

Actors are cheap - it’s not uncommon to see production actor systems run with several million actors active at any given time. Therefore this creates a modeling approach that drastically decreases reduces the amount your developers have to write in order to build effective actor-based applications.

Here’s a question we often ask when introducing some of the Akka.NET actor composition patterns we teach in our design patterns training:

Suppose your actors have to actively track and respond to changes made to business entities in real-time, and you had to be able to track changes to between 1 and N of them concurrently - what would be easier to do?

Write 1 actor class that is responsible for managing all entities simultaneously and run exactly 1 copy of that actor OR

Write 1 actor class that is responsible for managing 1 entity and run N concurrent instances of that actor.

Overwhelmingly, the second option is clearly the better choice - but it’s the choice that isn’t obvious until you explore the actor model.

Designing an actor that manages a single entity requires very little code compared to the alternative, and Akka.NET makes it very inexpensive to create new actors on-the-fly as needed. So you end up with less code but there’s also a second benefit: multiple instances of these actors can all be run concurrently, which increases overall system throughput compared to the omnibus actor.

In general the actor model is very good at atomizing code down to small, single-purpose parts. Users who’ve converted older projects to the actor model reported being able to reduce their code footprint by as much as 83%. That’s 83% less code to have to train to new developers, less code to test, less code to debug, and less code to maintain across versions.

Akka.NET actors give you location transparency, so you can scale your applications up and out across the network without any configuration changes

One of the biggest benefits of Akka.Remote and Akka.Cluster, the two modules that serve as the foundation for high availability programming with Akka.NET, is that none of your actors need to be programmed any differently to work with actors over the network. An IActorRef can be a LocalActorRef or a RemoteActorRef as far as you know - so this gives your developers the ability to achieve that linear scalability mentioned earlier in this paper.

Add more servers with more actors when demand is high; remove servers and consolidate your actors onto fewer machines when demand is low. Akka.NET makes it possible to scale your software to its demand with no configuration changes or single points of failure.

Akka.NET actors give you the ability to build self-healing systems that can recover from failure

Network failures are a matter of “when,” not “if,” in all distributed systems. Due to the parental supervision model of Akka.NET and the tools for remote failure detection built into Akka.Remote and Akka.Cluster, it’s possible to build systems that can quickly isolate and recover from failures that occur in production.

There’s a popular saying among the Akka and Erlang communities: “Let it crash.”

The idea behind this is that if any type of hardware failure should require human intervention in a production system, then that system isn’t inherently scalable. Therefore - it’s the philosophy of actor systems that the path to recovering from a failure should be very simple and tested frequently.

Akka.NET has robust capabilities for detecting and handling all manner of failures - whether they be program failures or network failures and makes it easy for its users to decide in a context-specific way how to recover from them, whether it be restarting a failed actor or redeploying a group of abruptly terminated-by-failure actors onto a new machine.

Akka.NET has a large community and implements a time-tested model proven to scale in production

Akka.NET and its lineage (Akka and Erlang) are battle-tested models that are proven quantities in production. And Akka.NET is all open source (Apache 2.0,) so it’s free to use in commercial applications.

At the time of writing this, a single Akka.NET ReceiveActor can process 4.5 million messages per second on a single core - and that number will grow linearly with the target machine’s core count.

And there are thousands upon thousands of users - Akka.NET Bootcamp has had more than 3,000 developers complete it since last year and there are more than 70,000 installations of Akka.NET NuGet packages to-date. The community is large and getting bigger all the time.

So if you’re facing scaling challenges with your business, don’t reinvent the wheel - use something that works.

How to Get Started with Akka.NET

So if you’re considering using Akka.NET at work, here’s how to get started:

Complete Akka.NET Bootcamp, which will get you off to a fast start with the actor model’s concepts and Akka.NET’s implementation;
Take Petabridge’s live Akka.NET webinar training once you’re ready to get started on a proof of concept or customer-facing Akka.NET project; and
Consider doing an onsite Akka.NET training for your team or otherwise using Petabridge’s Akka.NET services to help get your project off to a fast start.

And as always, if you have any questions you can contact us at [email protected].

If you liked this post, you can share it with your followers or follow us on Twitter!

Written by Aaron Stannard on May 10, 2016

Read more about:
Akka.NET
Case Studies
Videos

Observe and Monitor Your Akka.NET Applications with Phobos

Did you know that Phobos can automatically instrument your Akka.NET applications with OpenTelemetry?

Click here to learn more.