How We Made Phobos 2.4's OpenTelemetry Usage 62% Faster

OpenTelemetry Performance Optimization Practices

Phobos is our observability + monitoring library for Akka.NET and last year we launched Phobos 2.0 which moved our entire implementation onto OpenTelemetry.

Phobos 2.0 Instruments Akka.NET with OpenTelemetry

One of the issues we’ve had with Phobos: many of our customers build low-latency, real-time applications and adding observability to a software system comes with a noticeable latency + throughput penalty. We were asked by a customer at the beginning of July “is there any way you can make this faster?”

Challenge accepted - and completed.

Earlier this week we shipped Phobos 2.4.0, which is a staggering 62% faster than all of our previous Phobos 2.x implementations - actually Phobos 2.4 is even faster than that for real-world applications and we’ll get to that in a moment.

This blog post is really about optimizing hot-paths for maximum OpenTelemetry tracing and metrics performance and the techniques we used in the course of developing Phobos 2.4.

Let’s dig in.

Akka.NET v1.5: No Hocon, No Lighthouse, No Problem

Exploring Akka.Hosting, Akka.HealthCheck, and Akka.Management

In our previous post we covered the Akka.NET v1.5 release and in particular, we focused on the changes made to the core Akka.NET modules.

In this blog post we’re going to cover the three new libraries we’ve added to Akka.NET as part of the v1.5 development effort:

Akka.NET v1.5 is Now Available

Akka.Hosting, Akka.Management, Akka.HealthCheck, .NET 6 Dual Targeting, Akka.Cluster.Sharding Overhaul, and Many More Improvements.

As of today, Akka.NET v1.5 is now available as a stable release package on NuGet for both .NET Standard 2.0 and NET 6.0. This is a big release aimed at addressing pain points for current Akka.NET users.

We’ve published a detailed article on the Akka.NET website that describes what’s new in Akka.NET v1.5, but we wanted to capture some of the highlights here.

Scaling Akka.Persistence.Query to 100k+ Concurrent Queries for Large-Scale CQRS

How we solved an acute event-driven scaling problem for users in Akka.NET v1.5.

One of our major engineering milestones for Akka.NET v1.5 (ships on February 28th, 2023):

Make CQRS a priority in Akka.Persistence

This blog post is about an interesting engineering challenge we had to solve to accomplish this for Akka.NET v1.5: supporting hundreds of thousands or even millions of concurrent Akka.Persistence.Query projection queries all targeting a single database.

Fundamentally - there are many different facets of Akka.Persistence and Akka.Persistence.Query scalability that needed solving, but this post fixates on an acute problem users have in production with Akka.Persistence.Query today: large numbers of concurrent queries absolutely melting down production-grade database deployments.

Users reported APQ knocking their databases out with as few as 3-4 thousand streaming queries running at rates of once every 1-3 seconds - with several nodes all running similar workloads (so in aggregate, perhaps closer 10k+ queries per second.)

Beginning in Akka.NET v1.5, we have eliminated this issue and have explicitly tested it up to 100,000 queries per second targeting a dinky SQL Server 2019 database running inside a Docker container. We suspect our design can scale up to support 10s of millions of concurrent queries (although for reasons I get into later, end-users should use different approach.)

Here’s what we did.

Author’s Note: Petabridge turned 8 years old this month and over the years I’ve only written a couple of articles about our internal processes for developing software and running an open source software business. In late 2021 we began using OKRs - “Objectives and Key Results” - as a general management system for setting quarterly goals and allocating accountabilities across the organization.

OKRs took some adjusting to, but have worked out really well for us overall - in combination with using Notion to record daily plans, document critical procedures, write technical specifications, and track progress against our key results each week.

Tracking these OKRs and our daily work plans made it quite easy for me to summarize everything our team accomplished in each area of our business in 2022 - and I wanted to take advantage of that and show everyone the impact their work had in the previous year.

So to kick off 2023 I showed a summary of all of our key results in each area of our business to our team members and we spent a day going through them - what follows below are the sections about Akka.NET, what we accomplished last year, and what’s next in store for Akka.NET.

.NET 7.0's Performance Improvements with Dynamic PGO are Incredible

Akka.Remote is 33% faster, Akka.NET v1.5 is 75% faster in-memory.

.NET 7.0 was released to market last week and includes hundreds of major improvements across the board.

I ran Akka.NET’s RemotePingPong benchmark on .NET 7.0 shortly after installing the .NET 7.0 SDK - I’ll take every free lunch I can from the CoreCLR team.

Here’s how the numbers compare between .NET 6.0 and .NET 7.0 for RemotePingPong:

Lightbend's Akka License Change and Akka.NET

Akka's License Change Does Not Impact Akka.NET

N.B. For the purpose of clarity: “Akka” refers to Lightbend’s Scala / Java library and “Akka.NET” refers to the .NET Foundation library maintained by Petabridge. I have been very careful in my writing to ensure there is a little confusion as possible in this post.

TL;DR; Akka.NET is Not Impacted

Imagine my surprise this week: out of the blue one of the core committers to Akka.NET forwards me a link to a Lightbend blog post entitled “Why We Are Changing the License for Akka.”

Lightbend logo

Lightbend’s license change for the original Akka library has no impact on Akka.NET. All of Akka.NET’s source is still Apache 2.0 and anything we’ve ported from the original Akka library was also done under Apache 2.0 as well.

Phobos 2.0 Released - OpenTelemetry Meets Akka.NET

Phobos 2.0 Now Released to Market, Includes OpenTelemetry Support, Akka.Hosting, and More

As of today, Phobos 2.0 - our fully OpenTelemetry-enabled instrumentation library for Akka.NET, is now available for production use with Akka.NET.

Phobos 2.0 Instruments Akka.NET with OpenTelemetry

The key features of Phobos 2.0 are as follows:

  1. Requires no instrumentation code on the part of the end-user;
  2. Automatically creates and propagates OpenTelemetry traces during actor messaging, creation, Ask<T>, crashes, and restarts;
  3. Automatically records OpenTelemetry metrics for message processing by message type / actor type, mailbox depth, message processing latency, log rates, error rates, and more;
  4. Automatically records OpenTelemetry metrics for the state of the Akka.NET cluster - including the number of unreachable members, members by status, and so on;
  5. Includes enhanced noise control for OpenTelemetry tracing via the ITraceFilter interface, which allows you to suppress the creation of unwanted trace data in order to reduce cost, noise, and resource consumption;
  6. Measures latency on message-processing activity from the point in which the message is initially created, so in-flight time over Akka.Remote or time spent in queue can now be easily observed and monitored;
  7. Uses Akka.Hosting to make Phobos a HOCONless installation experience;
  8. Is high performance - tracing obviously produces some additional CPU, memory allocation, and bandwidth overhead but Phobos metrics are allocation-free and very performant; and lastly
  9. Comes with ready-made Akka.NET dashboards for many popular metrics and data visualization platforms.

Phobos’ license fees are still the same as Phobos 1.0 - $4000 per organization per year, and you can buy Phobos instantly with a 30-day moneyback guarantee through Sdkbin.

Introducing Akka.Hosting - HOCONless Akka.NET Configuration and Runtime

Best Practices and Patterns for Asynchronous Programming with Akka.NET Actors

In our Akka.NET Community Standup on March 9th, 2022 we presented for the very first time Akka.Hosting - a new approach to configuring Akka.NET and managing ActorSystems that requires zero HOCON, automatically enforces Akka.NET best practices, is type-checked, and makes it easy to pass IActorRefs via Microsoft.Extensions.DependencyInjection using the brand new ActorRegistry construct. Although Akka.Hosting is part of the current Akka.NET v1.5 development effort underway, it is already available for use with Akka.NET v1.4 and is ready for production use.

In this post and accompanying video we demonstrate how to use Akka.Hosting to streamline Akka.NET configuration, ActorSystem life-cycle management, actor instantiation, dependency injection, and more.

If you want to see the code samples that goes along with this video and blog post, you can find them here:

Announcing Petabridge.Templates 2.0 - Professional Akka.NET Application and Library Templates

Ready-made Akka.NET application and library dotnet new templates for creating professional-quality projects!

We’ve prepared a set of three professional-grade dotnet new templates (Petabridge.Templates) for use with your projects - these will help you get started right away with Akka.NET or developing your own NuGet libraries that you wish to distribute. We’ve built and maintained these for years in order to create our own internal and external projects here at Petabridge.

But first, I should probably tell you a personal story, about how I improved the professionalism on my projects with build automation. Before I had the opportunity to work on a build system, I have always known the advantage of them and envied projects that had them.
On Twitter I read tweets from different .NET developers about creating a powerful build automations; I felt like it was a rocket science and secretly wished to be doing something like that. I was stuck, I needed a boost to move me out of that state.