// BLOG

Case Study: Luware Building Mission-Critical Voice Services with Akka.NET

How Luware's Nimbus handles millions of calling minutes per month on Akka.NET cluster sharding, event sourcing, and Phobos.

By Aaron Stannard · May 13, 2026 · 12 minutes to read

Akka.NET Case Studies

Every so often a customer comes along whose use case is so demanding that it validates every design decision you’ve ever made in the framework. Luware is that customer for us.

Jason Shave is an Engineering Manager at Luware, where he leads a team of developers working on Nimbus, contact center software that integrates with Microsoft Teams. Luware handles millions of calling minutes per month, serves over 1,000 customers worldwide, and operates 24/7 with zero tolerance for downtime. When I sat down with Jason, I wanted to understand why he picked Akka.NET, how his team uses it, and what the results have looked like in production.

What follows is that conversation.

Tell Us About Luware and What You Do

Jason Shave, Engineering Manager at Luware:

We build Nimbus, which is contact center software that integrates with Microsoft Teams. Think of everything involved in enterprise voice: call routing, queuing, transfers, recording, analytics. We plug directly into Teams so organizations can run their entire contact center operation through the platform they’re already using.

We’re global. We run 24/7. We handle multi-millions of calling minutes every month across more than a thousand customers. And the thing about voice is that it has to work. An old telephony guy once told me: “The dial tone comes from God.” Meaning, when you pick up a phone, you expect to hear that tone. You don’t expect it to buffer. You don’t expect a 500 error. It just works. That’s the bar we’re held to every single day.

What Problem Were You Trying to Solve?

The technical challenges Jason’s team faced are familiar to anyone building distributed, stateful systems at scale, but the consequences of failure are higher than most.

We had two core problems that were challenging us. The first was concurrency. We run a multi-microservice architecture, and when you’re managing live call state across distributed services, you end up in distributed locking hell. Race conditions everywhere. Coordination overhead that gets worse the more you scale. And when timing matters in milliseconds because you’re handling live voice, a race condition doesn’t just mean stale data. It means a dropped call.

The second problem was webhook routing. The telephony APIs we work with, Microsoft’s calling APIs, are fundamentally asynchronous. You make an API call to do something with a call, and the result comes back later as a webhook callback to your service. But your services are stateless. So when that callback arrives, it needs to find the right stateful call session, the specific instance that’s waiting for that response, somewhere in a cluster of pods that might have scaled up or down since the original request was made. In a stateless world, routing that callback to the correct in-memory context is a genuinely hard distributed systems problem.

We put a lot of time and effort into trying to fix the concurrency problems with traditional approaches like distributed locking. We also looked at building a custom solution and evaluated Dapr, but none of it fully solved the problem.

sequenceDiagram
    participant C as Client
    participant LB as Load Balancer
    participant HA as Host A
    participant HB as Host B
    participant HC as Host C
    participant Ext as External Service

    Note over HA,HC: Each host holds its own in-memory state
    C->>LB: POST /start-job
    LB->>HB: Route request (round-robin)
    HB->>HB: Store job state locally
    HB->>Ext: Call external API (callbackUrl = /webhook/job-123)
    LB-->>C: 202 Accepted
    Note over Ext: Time passes...
    Ext->>LB: POST /webhook/job-123
    LB->>HC: Route callback (round-robin)
    Note over HC: State not found!<br/>Job state lives on Host B,<br/>but the callback landed on Host C
    HC-->>Ext: 404 / 500 Error
    Note over C,Ext: The load balancer has no way to know<br/>which host owns the state for this callback

Figure 1. Webhook callback routing challenge, showing how an inbound webhook from a telephony API needs to locate the correct stateful call session across a cluster of stateless service instances.

How Did You Evaluate Solutions? Why Akka.NET over Orleans?

This is where the conversation got interesting. Every .NET developer’s first question when they hear “actor model” is “why not Orleans?” Jason looked at it seriously and walked away.

Honestly, Orleans felt like it was getting stale. Development activity had slowed down. The community seemed to be cooling off. For us, that’s a warning sign. We needed something with staying power.

Akka.NET had a different story. It traces back to Akka on the JVM, which is hugely popular on the Java side. I don’t know a ton about the Java ecosystem, but I knew it started there and that Petabridge had ported it to .NET. That kind of lineage matters. Orleans didn’t really have that as its own thing, so it could easily be cut away.

Then there was the community. When I started poking around the Akka.NET Discord, people were genuinely welcoming. Helpful. Engaged. Some of the other communities I’d looked at felt a little stuffy, if I’m being honest. That sounds like a soft factor, but when your team is learning a new paradigm, it makes a real difference in ramp-up time.

But the real deciding factor was support. With Petabridge, you get direct access to the people who actually write the framework. These are the guys that write the software. These are the guys that know it. You’re not dealing with tier three support. I’ve had cases where I was stuck on something, asked a question, and got code samples and a detailed explanation back. We ended up with a great projection system running because of that kind of hands-on help. I just would never have gotten that from the other side of the fence. There’s just no way.

And there’s a business alignment piece that gave me confidence. Petabridge’s entire livelihood depends on Akka.NET succeeding. They don’t have some giant cloud platform or enterprise suite pulling their attention elsewhere. Their survival is tied to this framework working well. When your vendor’s incentives are that aligned with yours, you can trust the investment will keep coming.

How Does Luware Use Akka.NET?

This is where the architecture gets genuinely elegant. Jason’s team built a cluster-sharded actor system with event sourcing at its core.

Location transparency solved the webhook routing problem immediately. With cluster sharding, each call session is an actor with a deterministic identity. When a webhook arrives, the message routes to the correct actor no matter which node in the cluster it lives on. No service discovery hacks. No sticky sessions. No external routing tables. It just works.

The concurrency problem itself is solved because now we have state in memory. Each actor owns its state exclusively. No shared mutable state. No distributed locks. One actor, one call, one sequential message stream. The entire class of race conditions we used to deal with just disappears.

Event sourcing gives us durability. If a pod dies, the actor recovers on another node by replaying its event journal. No state lost. The call keeps going.

But the star feature, the thing that really won me over, is behavior switching with Become. This is where the actor model shines for telephony. Think about a webhook event like “person added to the call.” In a normal inbound call, that means the agent answered. You route to the next step in the workflow. But in a consultative transfer, that same “person added” event means the transfer target picked up, and now you need to bridge the original caller with the new party and drop the transferring agent. Same payload, completely different workflow. The actor handles this naturally by switching its behavior to match the current call state.

We also built dead call detection through scheduled messages and actor passivation to keep the cluster clean. Actors that haven’t received messages within a timeout window shut themselves down and free up resources.

One of the outcomes I find most compelling is how Jason’s architecture scales across teams, not just across nodes.

We built reusable actor templates that encapsulate the core call management patterns: lifecycle management, behavior switching, event sourcing, dead call detection. These templates abstract the Akka.NET details behind clean interfaces.

When our sister team needed to build a new channel integration with a completely different set of APIs, they shipped their product on top of the same actor architecture without needing to become Akka.NET experts. They implemented the channel-specific logic while the framework handled all the hard distributed systems stuff. That team is already in production.

We also run the whole thing locally using .NET Aspire. I can run the entire new call handling platform on my laptop, interact with all the other microservices, step through the code inside the actor, view state, and watch messages come in and out. Being able to get it up and running with Aspire is a huge advantage to shifting quality left for us.

Phobos for Observability

Luware uses Phobos, our OpenTelemetry integration for Akka.NET, to get distributed tracing through their actor hierarchy. Standard APM tools can trace HTTP requests, but they don’t understand actor message processing, mailbox depth, or actor lifecycle events.

Phobos gives us full traceability from an inbound webhook request through specific actor message processing to outbound API calls. When a call fails, we can trace the entire chain of events across nodes and actors to find exactly where things broke. For a system handling millions of calling minutes per month, that traceability is not optional. That’s worth its weight in gold right there.

What Were the Results?

It just works, simple as that. The platform is well designed and with a great partnership, we’re very confident in the platform.

Any problem that has a degree of issue with concurrency where the state needs to be completely locked down. High-frequency trading, banking systems, anything with high concurrency and messaging. If you’ve got routing problems where you need to find a single machine within a cluster, I’d 100% recommend Akka.NET.

You have to study, you have to read, you have to learn and prototype, just like any framework. But with a good partnership with folks that are willing to help, folks who actually write the software and know it inside out, it makes all the difference.

Jason’s experience reinforces a pattern we see with nearly every Akka.NET adoption. Teams arrive with two or three genuinely hard distributed systems problems. They’ve exhausted the traditional approaches. And then the actor model doesn’t just solve those problems. It eliminates the categories of problems entirely.

If you’re building mission-critical systems on .NET and you want to talk architecture, get in touch with us. Or jump into the Akka.NET Discord and see for yourself how the community operates.

Observe and Monitor Your Akka.NET Applications with Phobos

Phobos automatically instruments your Akka.NET applications with OpenTelemetry — traces, metrics, and logs with built-in dashboards.

Buy Now Learn More

Aaron Stannard

CEO & Co-Founder, Petabridge

Creator of Akka.NET. Building distributed systems infrastructure for .NET since 2015. Writes about OSS business models, distributed architecture, and the intersection of AI and systems programming.

twitter.com/Aaronontheweb →

Enjoyed this post? Subscribe to our newsletter for more insights on distributed systems, Akka.NET, and .NET + AI.