Scaling Akka.Persistence.Query to 100k+ Concurrent Queries for Large-Scale CQRS

How we solved an acute event-driven scaling problem for users in Akka.NET v1.5.

One of our major engineering milestones for Akka.NET v1.5 (ships on February 28th, 2023):

Make CQRS a priority in Akka.Persistence

This blog post is about an interesting engineering challenge we had to solve to accomplish this for Akka.NET v1.5: supporting hundreds of thousands or even millions of concurrent Akka.Persistence.Query projection queries all targeting a single database.

Fundamentally - there are many different facets of Akka.Persistence and Akka.Persistence.Query scalability that needed solving, but this post fixates on an acute problem users have in production with Akka.Persistence.Query today: large numbers of concurrent queries absolutely melting down production-grade database deployments.

Users reported APQ knocking their databases out with as few as 3-4 thousand streaming queries running at rates of once every 1-3 seconds - with several nodes all running similar workloads (so in aggregate, perhaps closer 10k+ queries per second.)

Beginning in Akka.NET v1.5, we have eliminated this issue and have explicitly tested it up to 100,000 queries per second targeting a dinky SQL Server 2019 database running inside a Docker container. We suspect our design can scale up to support 10s of millions of concurrent queries (although for reasons I get into later, end-users should use different approach.)

Here’s what we did.

Author’s Note: Petabridge turned 8 years old this month and over the years I’ve only written a couple of articles about our internal processes for developing software and running an open source software business. In late 2021 we began using OKRs - “Objectives and Key Results” - as a general management system for setting quarterly goals and allocating accountabilities across the organization.

OKRs took some adjusting to, but have worked out really well for us overall - in combination with using Notion to record daily plans, document critical procedures, write technical specifications, and track progress against our key results each week.

Tracking these OKRs and our daily work plans made it quite easy for me to summarize everything our team accomplished in each area of our business in 2022 - and I wanted to take advantage of that and show everyone the impact their work had in the previous year.

So to kick off 2023 I showed a summary of all of our key results in each area of our business to our team members and we spent a day going through them - what follows below are the sections about Akka.NET, what we accomplished last year, and what’s next in store for Akka.NET.

.NET 7.0's Performance Improvements with Dynamic PGO are Incredible

Akka.Remote is 33% faster, Akka.NET v1.5 is 75% faster in-memory.

.NET 7.0 was released to market last week and includes hundreds of major improvements across the board.

I ran Akka.NET’s RemotePingPong benchmark on .NET 7.0 shortly after installing the .NET 7.0 SDK - I’ll take every free lunch I can from the CoreCLR team.

Here’s how the numbers compare between .NET 6.0 and .NET 7.0 for RemotePingPong:

Lightbend's Akka License Change and Akka.NET

Akka's License Change Does Not Impact Akka.NET

N.B. For the purpose of clarity: “Akka” refers to Lightbend’s Scala / Java library and “Akka.NET” refers to the .NET Foundation library maintained by Petabridge. I have been very careful in my writing to ensure there is a little confusion as possible in this post.

TL;DR; Akka.NET is Not Impacted

Imagine my surprise this week: out of the blue one of the core committers to Akka.NET forwards me a link to a Lightbend blog post entitled “Why We Are Changing the License for Akka.”

Lightbend logo

Lightbend’s license change for the original Akka library has no impact on Akka.NET. All of Akka.NET’s source is still Apache 2.0 and anything we’ve ported from the original Akka library was also done under Apache 2.0 as well.

Phobos 2.0 Released - OpenTelemetry Meets Akka.NET

Phobos 2.0 Now Released to Market, Includes OpenTelemetry Support, Akka.Hosting, and More

As of today, Phobos 2.0 - our fully OpenTelemetry-enabled instrumentation library for Akka.NET, is now available for production use with Akka.NET.

Phobos 2.0 Instruments Akka.NET with OpenTelemetry

The key features of Phobos 2.0 are as follows:

  1. Requires no instrumentation code on the part of the end-user;
  2. Automatically creates and propagates OpenTelemetry traces during actor messaging, creation, Ask<T>, crashes, and restarts;
  3. Automatically records OpenTelemetry metrics for message processing by message type / actor type, mailbox depth, message processing latency, log rates, error rates, and more;
  4. Automatically records OpenTelemetry metrics for the state of the Akka.NET cluster - including the number of unreachable members, members by status, and so on;
  5. Includes enhanced noise control for OpenTelemetry tracing via the ITraceFilter interface, which allows you to suppress the creation of unwanted trace data in order to reduce cost, noise, and resource consumption;
  6. Measures latency on message-processing activity from the point in which the message is initially created, so in-flight time over Akka.Remote or time spent in queue can now be easily observed and monitored;
  7. Uses Akka.Hosting to make Phobos a HOCONless installation experience;
  8. Is high performance - tracing obviously produces some additional CPU, memory allocation, and bandwidth overhead but Phobos metrics are allocation-free and very performant; and lastly
  9. Comes with ready-made Akka.NET dashboards for many popular metrics and data visualization platforms.

Phobos’ license fees are still the same as Phobos 1.0 - $4000 per organization per year, and you can buy Phobos instantly with a 30-day moneyback guarantee through Sdkbin.

Introducing Akka.Hosting - HOCONless Akka.NET Configuration and Runtime

Best Practices and Patterns for Asynchronous Programming with Akka.NET Actors

In our Akka.NET Community Standup on March 9th, 2022 we presented for the very first time Akka.Hosting - a new approach to configuring Akka.NET and managing ActorSystems that requires zero HOCON, automatically enforces Akka.NET best practices, is type-checked, and makes it easy to pass IActorRefs via Microsoft.Extensions.DependencyInjection using the brand new ActorRegistry construct. Although Akka.Hosting is part of the current Akka.NET v1.5 development effort underway, it is already available for use with Akka.NET v1.4 and is ready for production use.

In this post and accompanying video we demonstrate how to use Akka.Hosting to streamline Akka.NET configuration, ActorSystem life-cycle management, actor instantiation, dependency injection, and more.

If you want to see the code samples that goes along with this video and blog post, you can find them here:

Announcing Petabridge.Templates 2.0 - Professional Akka.NET Application and Library Templates

Ready-made Akka.NET application and library dotnet new templates for creating professional-quality projects!

We’ve prepared a set of three professional-grade dotnet new templates (Petabridge.Templates) for use with your projects - these will help you get started right away with Akka.NET or developing your own NuGet libraries that you wish to distribute. We’ve built and maintained these for years in order to create our own internal and external projects here at Petabridge.

But first, I should probably tell you a personal story, about how I improved the professionalism on my projects with build automation. Before I had the opportunity to work on a build system, I have always known the advantage of them and envied projects that had them.
On Twitter I read tweets from different .NET developers about creating a powerful build automations; I felt like it was a rocket science and secretly wished to be doing something like that. I was stuck, I needed a boost to move me out of that state.

Async / Await vs. PipeTo in Akka.NET Actors

Best Practices and Patterns for Asynchronous Programming with Akka.NET Actors

Many years ago I wrote an article entitled “How to Do Asynchronous I/O with Akka.NET Actors Using PipeTo” - back when await support was still very much a work in progress. I subsequently wrote a major update on that article in 2016 - talking mostly about the benefits of PipeTo from a structural point of view.

I recently learned that this article has many of our own users under the impression that Akka.NET actors don’t support await semantics properly and that PipeTo is the only “blessed” pathway for working with asynchronous operations inside actors. This is not the case - await has had first class support inside actors for many years now.

That’s what this post and video cover: async, await, and PipeTo - how do they work differently, when should you consider using one over the other, and what are the “gotchas” that you need to know?

If you want to see the code sample that goes along with this video and blog post, you can find that here: https://github.com/Aaronontheweb/Akka.AsyncAwait

Phobos Updates: Improvements and Upcoming OpenTelemetry Support

Tracing Ask and PipeTo Operations; Measuring Message Latency; and Phobos 2.0 Beta with OpenTelemetry Support

Since announcing Phobos 1.0 a year ago we’ve made many significant changes to it that should add many useful new ways to observe the run-time behavior of your Akka.NET applications in production.

Phobos logo

Phobos is a commercial add-on for Akka.NET that allows developers to automatically record distributed traces and metrics from their actors without the need for any manual instrumentation code.

In this post we’re going to cover:

  • Improvements that have been made to Phobos all the way up through the v1.3.2 version released on December 21st, 2021;
  • Phobos 2.0, which will be built entirely upon OpenTelemetry and should be in beta soon; and
  • Our ongoing support plan for Phobos 1.x and 2.x over the next two years.

A concept I’ve been trying to put a name to is software application architectures that inherently lend themselves to a lower rate of technical debt accumulation than others. This post is my first attempt to do that.

Technical Debt

Technical debt is a term that is used frequently in our industry and while its meaning is commonly understood among experienced technologists, it’s not clearly defined.

In the abstract:

  • Technical debt is cost incurred from software design and implementation choices made in the past;
  • Interest on that technical debt accrues and compounds as a result of subsequent decisions that are layered onto the original set of choices; and
  • The full cost of technical debt is not known until, at some point in the future, there is a need to modify the software system which forces the development team to modify the original choices built into the existing system and calculate the level of effort needed to change them safely.