Scaling Akka.Persistence.Query to 100k+ Concurrent Queries for Large-Scale CQRS
How we solved an acute event-driven scaling problem for users in Akka.NET v1.5.
27 minutes to readOne of our major engineering milestones for Akka.NET v1.5 (ships on February 28th, 2023):
Make CQRS a priority in Akka.Persistence
This blog post is about an interesting engineering challenge we had to solve to accomplish this for Akka.NET v1.5: supporting hundreds of thousands or even millions of concurrent Akka.Persistence.Query projection queries all targeting a single database.
Fundamentally - there are many different facets of Akka.Persistence and Akka.Persistence.Query scalability that needed solving, but this post fixates on an acute problem users have in production with Akka.Persistence.Query today: large numbers of concurrent queries absolutely melting down production-grade database deployments.
Users reported APQ knocking their databases out with as few as 3-4 thousand streaming queries running at rates of once every 1-3 seconds - with several nodes all running similar workloads (so in aggregate, perhaps closer 10k+ queries per second.)
Beginning in Akka.NET v1.5, we have eliminated this issue and have explicitly tested it up to 100,000 queries per second targeting a dinky SQL Server 2019 database running inside a Docker container. We suspect our design can scale up to support 10s of millions of concurrent queries (although for reasons I get into later, end-users should use different approach.)
Here’s what we did.