Find bottlenecks, sources of error, and track changes to your cluster in real-time.
9 minutes to read
Phobos 2.10 is here, and it’s a game-changer for anyone running Akka.NET applications in production. This release doesn’t just incrementally improve observability - it fundamentally transforms how you understand and troubleshoot actor performance in your clusters.
The headline features: accurate backpressure measurement across all actors, a bird’s-eye view of your entire Akka.NET cluster activity, detailed actor performance analysis dashboards, and the ability to easily filter /system and /user actors from each other. But here’s what makes this release special - it’s not just about the new metrics (though those are substantial). It’s about the beautiful, production-ready dashboards that make all this data instantly actionable.
Decouple your observability configuration from your application code with OTLP and collectors
19 minutes to read
We know OpenTelemetry deeply at Petabridge. We’ve built Phobos, an OpenTelemetry instrumentation plugin for Akka.NET, so we understand the low-level bits. Beyond that, we’ve been using OpenTelemetry in production for years on Sdkbin and we’ve helped over 100 customers implement OpenTelemetry configurations very similar to our own. Through all this experience, one thing has become crystal clear: the easiest, most production-ready approach to OpenTelemetry in .NET is using OTLP (OpenTelemetry Line Protocol) with a collector.
In this post, I’ll walk you through why this approach beats vendor-specific exporters every time, show you exactly how to configure it, and demonstrate the real-world benefits we’ve experienced at Petabridge. This is the companion piece to my recent YouTube video on the topic.
The Problem with Vendor-Specific Exporters
When you’re getting started with OpenTelemetry for the first time in one of your projects, you know your team uses DataDog, or New Relic, or Application Insights. So naturally, the first thing you’ll try is figuring out how to connect your application directly to that specific tool.
You end up with something that looks like this:
builder.Services.AddOpenTelemetry().WithTracing(builder=>{builder.AddHttpClientInstrumentation().AddAspNetCoreInstrumentation()// Coupling our app to vendor-specific implementations.AddDatadogTracing()// Application code now depends on DataDog SDK.AddNewRelicTracing()// And New Relic SDK.AddAzureMonitorTracing();// And Azure Monitor SDK});
And you’re going to get frustrated doing this because of:
Vendor Coupling: Your application code is now directly coupled to vendor-specific SDKs...
Stop writing hundreds of lines of error handling code - there's a better way.
18 minutes to read
If you’re using Kafka in .NET, you’re probably writing hundreds of lines of code just to handle “what happens when my consumer crashes?” or “how do I retry failed messages?” or “what happens when I’m consuming messages too fast?”
What if I told you there was a way to handle all of that in just 5-10 lines of code?
That’s exactly what Akka.Streams.Kafka brings to the table - and it’s one of the most underrated parts of the entire Akka.NET ecosystem.
Specifically, we’re going to address how Akka.Cluster deals with split brains - a type of network failure that breaks a once-functioning cluster apart into smaller, isolated islands that can no longer communicate with each other.
MCP is very useful, but it's not curing cancer. Here's why you should use it.
10 minutes to read
We haven’t talked that much about AI and LLM-driven development here at Petabridge, aside from a webinar we ran a year ago, but we’ve been using it heavily on our day jobs:
Just this week we deployed massive performance & architecture improvements to Sdkbin - and Claude / Cursor were absolutely essential in helping us design, test, and bug-fix those.
One of the tools that’s allowed us to apply LLM-assisted coding successfully to massive code bases like Akka.NET and Sdkbin is the Model Context Protocol (MCP) - and in this post + accompanying YouTube video, we’re going explain what it is without dipping into the hyperbole you usually find on platforms like LinkedIn and X.
A safer, superior choice to using seed nodes with Akka.Cluster
15 minutes to read
Akka.Cluster is a very powerful piece of software aimed at helping .NET developers build highly available, low-latency, distributed software. At its core, Akka.Cluster is about building peer-to-peer networks—that’s what a “cluster” actually is: a peer-to-peer network that runs in a server-side environment controlled by a single operator.
What Clusters Need
This is a subject for another blog post, but what makes peer-to-peer networks a superior choice over client-server networks for high availability are the following:
Horizontally scalable, because the “source of truth” is decentralized and distributed to the endpoints of the network (these are your actors running in each process) rather than centralized in a single location;
Fault tolerant and resilient - having the source of truth decentralized and distributed also means that no single node in the network is so crucial that its disappearance is going to render the system unavailable; and
Still supports inter-dependent services - you can still have multiple services with completely different purposes and code cooperating together inside a peer-to-peer network. This is what Akka.Cluster roles are for.
In order to build a peer-to-peer network, you need two primary ingredients:
Topology-awareness - database-driven CRUD applications never need to do this. The load-balancer is aware of where the web servers are and the web servers are aware of where the database is, but that’s pretty much it. In a real peer-to-peer network, all applications need to know about each other and need to communicate with each other directly. These are what Akka.Remote (communication) and Akka.Cluster (topology) provide.
Initial formation - there must be a strategy for processes to form a new peer-to-peer network or to join an existing one.
In this blog post, we’ll be discussing item number 2—how to make the formation and joining of Akka.Cluster networks more reliable,...
Today though we’re writing about a brand new tool we’ve been working on for the past several months: Incrementalist v1.0, a command-line tool that leverages git and Roslyn solution analysis to drastically reduce build times for large .NET solutions and monorepos.
We’ve been using older versions of Incrementalist in production inside the Akka.NET build pipeline since 2019 - it cuts our average pull request build time down from about 1 hour and 15 minutes to ~15 minutes. Those older versions of Incrementalist just spat out the smallest possible build graphs as a .csv file - it was up to you to parse it and use the data accordingly.
Incrementalist v1.0 is a totally different animal: it runs the dotnet commands for you.
Akka.Persistence.Sql is the new flavor moving forward.
8 minutes to read
It was just about 10 years ago when we shipped Akka.NET v1.0.2, the release where we first introduced betas of some of our most popular Akka.Persistence plugins: Akka.Persistence.Postgres, Akka.Persistence.SqlServer, and Akka.Persistence.Sqlite.
All of these plugins were based off of a shared ADO.NET Akka.Persistence architecture called Akka.Persistence.Sql.Common and this architecture has served both us and our users / customers well over the past 10 years, somewhere to the tune of 1.6 million installations!
In the next sections we’ll explain our decision along with showing you our migration guide for moving off any of the affected Akka.Persistence.Sql.Common plugins and onto Akka.Persistence.Sql.
A painful lesson on atomicity and the assignment of structs.
21 minutes to read
Over the past several months the Akka.NET team has had reports of the following Exception popping up unexpectedly throughout many of our plugins and end-user applications that use the Akka.Streams1SelectAsync stage - such as Akka.Streams.Kafka and Akka.Persistence.Sql:
That error message seems simple enough - it comes from here inside GraphStage.cs:
[InternalApi]publicvoidInternalOnDownstreamFinish(Exceptioncause){try{if(cause==null)thrownewArgumentException("Cancellation cause must not be null",nameof(cause));
In Akka.Streams parlance, a stream gets cancelled when an unhandled Exception is thrown and that error should be propagated all the way down to this GraphStage.InternalOnDownstreamFinish method so we can log why the stream is being cancelled / terminated.
Here’s the mystery - this is the code that “threw” the Exception inside Akka.Persistence.Sql for instance: