Akka.NET + Kubernetes: Everything You Need to Know

Production lessons from years of running Akka.NET clusters at scale

37 minutes to read

Running Akka.NET in Kubernetes can feel like a daunting task if you’re doing it for the first time. Between StatefulSets, Deployments, RBAC permissions, health checks, and graceful shutdowns, there are a lot of moving parts to get right.

But here’s the thing: once you understand how these pieces fit together, Kubernetes actually makes running distributed Akka.NET applications significantly easier than trying to orchestrate everything yourself using ARM templates, bicep scripts, or some other manual approach.

We’ve been running Akka.NET clusters in Kubernetes for years at Petabridge—both for our own products like Sdkbin and for customers who’ve built systems with over 1400 nodes. We’ve learned a lot the hard way, and this post is all about sharing those lessons so you don’t have to make the same mistakes we did.

This isn’t a hand-holding, step-by-step tutorial. Instead, I’m going to focus on the critical decisions you need to make, the pitfalls to avoid, and the best practices that actually matter when running Akka.NET in production on Kubernetes.

Why Akka.NET and Kubernetes Are a Great Match

Kubernetes gets an unfair amount of hate in the .NET community. People treat it like this impossibly complex piece of machinery when, in reality, it’s quite a bit simpler to work with than trying to manually orchestrate distributed applications using cloud-specific tools. But why is Kubernetes particularly well-suited for Akka.NET?

StatefulSets: Built for Stateful Applications

Kubernetes has a concept called a StatefulSet, which is specifically designed for applications that maintain state in memory. This is exactly what Akka.NET does with actors and Akka.Cluster.Sharding! StatefulSets provide several key benefits:

  • Stable, reusable identities: Every pod in a StatefulSet has a predictable DNS name that persists across restarts
  • Ordered, one-at-a-time updates: When you deploy a new version, StatefulSets replace pods sequentially rather than all at once
  • No split-brain scenarios: Because updates happen gradually, you avoid the virtual IP swap problem where you suddenly have two separate clusters

These characteristics align perfectly with how stateful distributed systems need to operate.

Simplified Mesh Networking

Akka.NET, like Microsoft Orleans and other peer-to-peer frameworks, builds mesh networks where every node needs to communicate with every other node. In traditional cloud environments, setting this up involves:

  • Configuring firewall rules so all processes in a deployment can communicate with each other
  • Managing private networks with reachable address spaces between instances
  • Coordinating security groups
  • Fighting with VPN configurations

Kubernetes headless services make all of this trivial. They create an internal address space that allows pods to discover and communicate with each other seamlessly across multiple physical nodes. No additional firewall configuration required.

Kubernetes Networking for Akka.NET

Graceful Transitions and Health Checks

One of the things I’ve really grown to appreciate about Kubernetes is how well it supports graceful transitions for stateful applications. When you’re shutting down an Akka.NET node, you don’t want to just pull the rug out from under it by instantly killing the process. There might be:

  • In-flight messages being processed
  • Cluster sharding rebalancing operations in progress
  • Cluster singletons that have to be moved
  • Persistence operations that need to complete
  • State that needs to be handed off to another node

Kubernetes provides several mechanisms to support this:

  • terminationGracePeriodSeconds: Gives your application time to shut down cleanly
  • Readiness and liveness probes: Control when traffic starts and stops flowing to your pods
  • Pre-stop hooks: Execute custom cleanup logic before termination

These features complement Akka.NET’s built-in graceful shutdown capabilities perfectly.

Zero-Downtime Rolling Upgrades

Kubernetes makes it very easy to orchestrate zero-downtime deployments. You introduce a new Docker image tag with your updated application, tell Kubernetes to use it, and configure your deployment strategy to gracefully introduce the new version. As long as your pods pass their health checks, the deployment continues. If they fail, the deployment stops and you can roll back.

This is exactly what Akka.NET applications need: controlled, gradual updates that don’t disrupt the cluster.

The Big Decision: StatefulSets vs. Deployments

When you’re setting up Akka.NET in Kubernetes, the first major decision you’ll face is whether to use StatefulSets or Deployments. Both work, but they have different tradeoffs that significantly impact how your cluster behaves during updates.

Deployment Strategies Compared

StatefulSet Advantages

Stable DNS Names: Each pod gets a predictable DNS name like drawtogether-0.drawtogether, drawtogether-1.drawtogether, etc. This makes seed-based cluster formation possible without Akka.Management.

Built-in Ordering: Updates happen sequentially, which can be useful for certain types of migrations or when you need strict ordering guarantees.

StatefulSet Disadvantages

Here’s where things get interesting. The biggest problem with StatefulSets is how they handle rolling updates - and somewhat ironically, they create significantly more cluster churn than you’d expect if you’re using Akka.Cluster.Sharding.

Let me show you what happens during a StatefulSet rolling update:

StatefulSet Rolling Update Sequence

Notice the problem? Every pod gets rebalanced twice:

  1. Pod-0 leaves the cluster, all of its shards get instantly rebalanced to the surviving pods;
  2. Pod-0 rejoins the cluster (with the new version);
  3. Pod-1 leaves the cluster and rebalances all of its shards, which includes half of the original shards from Pod-0!
  4. Pod-1 rejoins the cluster (with the new version); and
  5. Pod-2 leaves the cluster and forces an immediate rebalance of its shards, which includes all of the shards from Pod-0 originally.

This puts tremendous pressure on Akka.Cluster.Sharding and creates a lot of unnecessary persistence / remoting activity.

The other major issue we ran into with StatefulSets at Petabridge: when deployments fail, you have to manually intervene to fix them. If a pod fails its liveness check during a rolling update, the StatefulSet update process just halts. You can’t easily do an automated rollback like you can with Deployments.

For us on Sdkbin, this usually happened because our init container (which applied Entity Framework Core migrations) would fail, not because of Akka.NET itself. But regardless of the cause, having to manually fix failed StatefulSet updates in the middle of the night isn’t fun.

Deployment Advantages

Gentler Rolling Updates: Deployments use a “surge” strategy where they temporarily spin up additional pods with the new version before terminating old pods. This creates much less cluster churn:

Deployment Rolling Update with Surge Strategy

Because the new pod comes online before the old pod terminates, most shards just get transferred directly from the old pod to its replacement. Notice how there are only 2 rebalancing events (shown in orange) compared to 4 with StatefulSets. Much cleaner.

Easy Rollbacks: If a deployment fails, Kubernetes can automatically roll back to the previous version. You don’t need manual intervention.

Better for Dynamic Scheduling: If you’re using tools like Karpenter to move pods to spot instances or optimize resource usage, Deployments work better because they use IP-based addressing instead of DNS.

Deployment Disadvantages

Requires Akka.Management: You can’t use static seed nodes because pod DNS names are dynamic. You must use Akka.Management with Akka.Discovery.Kubernetes.

Temporary Resource Increase: During deployment, you’ll temporarily have more pods running (e.g., 4 pods instead of 3). If your cluster is so resource-constrained that one extra pod breaks things, you have bigger problems than choosing between StatefulSets and Deployments.

Must Use IP Addressing: For Akka.Remote communication, you need to use pod IPs rather than DNS names. This isn’t really a disadvantage; K8s networking actually works better with IPs when using dynamic pod scheduling but it’s a difference worth noting.

Our Recommendation: Use Deployments

At Petabridge, we’ve moved from StatefulSets to Deployments for Sdkbin and generally recommend Deployments for new projects. The reduction in cluster churn and the automatic rollback capabilities more than make up for the requirement to use Akka.Management.

Here’s what the deployment strategy looks like in YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: drawtogether
  namespace: drawtogether
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Allow 1 extra pod during rollout
      maxUnavailable: 0  # Keep all pods available during update

This configuration ensures you always have at least 3 pods running and temporarily allows a 4th pod to come online during updates.

Configuring Akka.NET for Kubernetes

Let’s look at the actual code you need to get Akka.NET working in Kubernetes. I’ll use DrawTogether.NET, our real-time collaborative drawing application, as the reference implementation.

Setting Up Akka.Management with Kubernetes Discovery

The core of cluster formation in Kubernetes relies on Akka.Management and Akka.Discovery.Kubernetes. Here’s the configuration from our AkkaConfiguration.cs:

public static AkkaConfigurationBuilder ConfigureNetwork(
    this AkkaConfigurationBuilder builder,
    IServiceProvider serviceProvider)
{
    var settings = serviceProvider.GetRequiredService<AkkaSettings>();

    builder.WithRemoting(settings.RemoteOptions);

    if (settings.AkkaManagementOptions is { Enabled: true })
    {
        // Disable seed nodes so Akka.Management takes precedence
        var clusterOptions = settings.ClusterOptions;
        clusterOptions.SeedNodes = [];

        builder
            .WithClustering(clusterOptions)
            .WithAkkaManagement(setup =>
            {
                // Leave hostname blank in Kubernetes - uses pod IP
                setup.Http.HostName = settings.AkkaManagementOptions.Hostname?.ToLower();
                setup.Http.Port = settings.AkkaManagementOptions.Port;
                setup.Http.BindHostName = "0.0.0.0";
                setup.Http.BindPort = settings.AkkaManagementOptions.Port;
            })
            .WithClusterBootstrap(options =>
            {
                options.ContactPointDiscovery.ServiceName =
                    settings.AkkaManagementOptions.ServiceName;
                options.ContactPointDiscovery.PortName =
                    settings.AkkaManagementOptions.PortName;
                options.ContactPointDiscovery.RequiredContactPointsNr =
                    settings.AkkaManagementOptions.RequiredContactPointsNr;
                options.ContactPointDiscovery.ContactWithAllContactPoints = true;
                options.ContactPointDiscovery.StableMargin = TimeSpan.FromSeconds(5);
            }, autoStart: true);

        // Configure Kubernetes discovery
        switch (settings.AkkaManagementOptions.DiscoveryMethod)
        {
            case DiscoveryMethod.Kubernetes:
                builder.WithKubernetesDiscovery();
                break;
            // Other discovery methods omitted for brevity
        }
    }

    return builder;
}

Key Points:

  1. Leave hostname blank: In Kubernetes, Akka.Management should use the pod’s IP address, not a DNS name. The discovery system works better this way. We ensure this by just supplying a null value for this entry in DrawTogether’s configuration.

  2. Disable seed nodes: When Akka.Management is enabled, clear out any static seed node configuration. Akka.Management will handle cluster formation automatically.

  3. Contact point discovery: The ServiceName and PortName must match your Kubernetes service configuration (we’ll see that in a moment).

How Pod Label Selection Works

Before we dive into the Kubernetes YAML, it’s important to understand how Akka.Discovery.Kubernetes actually finds which pods should join together to form a cluster.

When you call .WithKubernetesDiscovery() in your Akka.NET configuration, the discovery mechanism queries the Kubernetes API directly for pods, filtering by labels. Here’s the process:

  1. Queries the Kubernetes API for pods in the same namespace
  2. Filters pods by the label selector app={ServiceName} where ServiceName comes from ContactPointDiscovery.ServiceName
  3. Checks for the management port specified in ContactPointDiscovery.PortName (default: management)
  4. Returns the IP addresses and ports of matching pods to Akka.Management for cluster bootstrap

The key insight: Akka.Discovery.Kubernetes looks directly at pod labels, not Service selectors. By default, it searches for pods with the label app={ServiceName} where ServiceName is the value you configured in ContactPointDiscovery.ServiceName.

So if you configure ServiceName = "drawtogether", discovery will find all pods labeled with app: drawtogether in the namespace. Pods with different labels (like app: chat-service) will be ignored, even if they’re in the same namespace.

Default Label Selection Behavior

By default, Akka.Discovery.Kubernetes:

  • Looks for pods in the same namespace as the pod running the discovery
  • Filters pods using the label selector app={0} where {0} is replaced by ContactPointDiscovery.ServiceName
  • Looks for a port named management (or whatever you specify in PortName) on those pods
  • Returns pod IPs (or DNS names for StatefulSets) that match these criteria

Customizing the Label Selector

If you need more control over which pods are discovered (for example, if you want to use different labels or add additional filtering criteria), you can customize the label selector using the KubernetesDiscoveryOptions:

builder.WithKubernetesDiscovery(options =>
{
    options.PodLabelSelector = "app={0},environment=production";
});

The {0} placeholder gets replaced with your ContactPointDiscovery.ServiceName value. This example would only discover pods with both app: drawtogether AND environment: production labels. You can use any valid Kubernetes label selector syntax here.

Important: While your Kubernetes Service typically uses the same app: drawtogether selector to route traffic to the pods, the Service selector and the discovery label selector are technically independent. Discovery queries pods directly by labels—it doesn’t query the Service first. That said, keeping these selectors aligned is a best practice to avoid confusion.

This label-based selection is what prevents accidental cluster mixing. Multiple Akka.NET deployments in the same namespace will form separate clusters as long as they use different app labels.

Kubernetes YAML Configuration

Now let’s look at the Kubernetes side. First, the headless service that enables pod discovery:

apiVersion: v1
kind: Service
metadata:
  name: drawtogether
  namespace: drawtogether
spec:
  publishNotReadyAddresses: true  # Critical for cluster formation!
  clusterIP: None                 # Makes this a headless service
  ports:
    - port: 8558
      name: management              # Matches PortName in Akka.Management
    - port: 5055
      name: akka-remote            # For Akka.Remote mesh networking
    - port: 8080
      name: http                   # For external HTTP traffic
  selector:
    app: drawtogether

The publishNotReadyAddresses Setting is Critical

This is a pitfall you absolutely want to avoid1. By default, Kubernetes won’t publish DNS entries for pods until they pass their readiness checks. But if your readiness check includes an Akka.Cluster health check (which verifies the node has joined the cluster), you create a catch-22:

  • Pod starts up
  • Tries to discover other pods to form a cluster
  • Can’t find other pods because their DNS entries aren’t published yet (they’re not ready either)
  • Can’t join cluster because there are no other pods
  • Fails readiness check because it hasn’t joined the cluster
  • DNS entry never gets published

Deadlock.

Setting publishNotReadyAddresses: true breaks this cycle by publishing DNS entries immediately when pods start, before they’ve passed their readiness checks.

Deployment Configuration

For a Deployment-based setup (recommended), here’s the environment variable configuration:

- name: POD_IP
  valueFrom:
    fieldRef:
      fieldPath: status.podIP
- name: AkkaSettings__RemoteOptions__PublicHostname
  value: "$(POD_IP)"  # Use IP instead of DNS
- name: AkkaSettings__AkkaManagementOptions__Enabled
  value: "true"
- name: AkkaSettings__AkkaManagementOptions__DiscoveryMethod
  value: "Kubernetes"

For StatefulSets, you’d use DNS-based addressing instead:

- name: POD_NAME
  valueFrom:
    fieldRef:
      fieldPath: metadata.name
- name: AkkaSettings__RemoteOptions__PublicHostname
  value: "$(POD_NAME).drawtogether"  # DNS-based addressing

RBAC: Giving Pods Permission to Discover Each Other

For Akka.Discovery.Kubernetes to work, your pods need permission to query the Kubernetes API and list other pods in the namespace. This is where Kubernetes RBAC (Role-Based Access Control) comes in.

You need three things:

  1. A ServiceAccount for your pods
  2. A Role that grants permission to list pods
  3. A RoleBinding that connects the ServiceAccount to the Role

Here’s the complete RBAC configuration from DrawTogether.NET:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: drawtogether
  namespace: drawtogether
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: pod-reader
  namespace: drawtogether
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "watch", "list"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: read-pods
  namespace: drawtogether
subjects:
  - kind: ServiceAccount
    name: drawtogether
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Then in your Deployment or StatefulSet spec:

spec:
  template:
    spec:
      serviceAccountName: drawtogether  # Use the ServiceAccount

With this RBAC configuration in place, when a pod starts up:

  1. It uses the drawtogether ServiceAccount credentials
  2. Calls the Kubernetes API to list pods in its namespace
  3. Filters the results by label selector app: drawtogether
  4. Checks each pod for the management port (port 8558)
  5. Uses those pod IPs/DNS names as contact points for cluster formation

This is why the RBAC permissions specifically grant get, watch, and list verbs for pods—Akka.Discovery.Kubernetes needs to directly query and continuously monitor pods as the cluster scales up and down.

Health Checks: Liveness vs. Readiness

Kubernetes has two types of health checks, and understanding the difference is crucial:

Liveness Probes: Answer the question “Is this process alive?” If a liveness probe fails, Kubernetes will restart the pod. This should almost always return true unless something catastrophic has happened (like the actor system has shut down).

Readiness Probes: Answer the question “Is this process ready to accept traffic?” If a readiness probe fails, Kubernetes removes the pod from load balancers but doesn’t restart it. This should check whether the pod has access to all resources it needs to do its job.

Configuring Health Check Endpoints

We recently added comprehensive health check support to Akka.Hosting, including dedicated checks for Akka.Cluster and Akka.Persistence. Here’s how to configure them:

services.AddAkka("DrawTogether", (builder, provider) =>
{
    builder
        .WithAkkaClusterReadinessCheck()    // Checks if node has joined cluster
        .WithActorSystemLivenessCheck()      // Checks if actor system is running
        .WithSqlPersistence(
            connectionString: connectionString,
            // ... other persistence config ...
            journalBuilder: j =>
            {
                j.WithHealthCheck(name: "Akka.Persistence.Sql.Journal[default]");
            },
            snapshotBuilder: s =>
            {
                s.WithHealthCheck(name: "Akka.Persistence.Sql.SnapshotStore[default]");
            });
});

These integrate with ASP.NET Core’s Microsoft.Extensions.Diagnostics.HealthChecks system. Next, you need to expose them as HTTP endpoints for Kubernetes to call:

public static class HealthCheckEndpointsExtensions
{
    public static IEndpointRouteBuilder MapHealthCheckEndpoints(
        this IEndpointRouteBuilder endpoints)
    {
        // /healthz - all checks (diagnostic use)
        endpoints.MapHealthChecks("/healthz", new HealthCheckOptions
        {
            Predicate = _ => true,
            ResponseWriter = WriteHealthCheckResponse
        });

        // /healthz/live - liveness checks only
        endpoints.MapHealthChecks("/healthz/live", new HealthCheckOptions
        {
            Predicate = check => check.Tags.Contains("liveness"),
            ResponseWriter = WriteHealthCheckResponse
        });

        // /healthz/ready - readiness checks (persistence, cluster membership)
        endpoints.MapHealthChecks("/healthz/ready", new HealthCheckOptions
        {
            Predicate = check => check.Name.Contains("ready"),
            ResponseWriter = WriteHealthCheckResponse
        });

        return endpoints;
    }
}

Then in Program.cs:

app.MapHealthCheckEndpoints();

Kubernetes Health Probe Configuration

In your Deployment/StatefulSet YAML:

livenessProbe:
  httpGet:
    path: /healthz/live
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20
  timeoutSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /healthz/ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

The readiness probe runs more frequently (every 10 seconds) because it’s used to determine whether traffic should be routed to the pod. The liveness probe runs less often (every 20 seconds) and is more tolerant of failures.

Securing Akka.Remote with TLS Certificates

One thing we haven’t talked about much publicly before is how to secure Akka.Remote communication using TLS certificates stored in Kubernetes Secrets. In production environments, you don’t want your cluster nodes communicating over unencrypted channels, especially if they’re exchanging sensitive data.

Creating and Storing TLS Certificates

First, generate a self-signed certificate for local testing (in production, use certificates from your certificate authority):

# Generate a self-signed certificate
openssl req -x509 -newkey rsa:4096 -keyout akka-key.pem -out akka-cert.pem -days 365 -nodes

# Convert to PKCS12 format (.pfx)
openssl pkcs12 -export -out akka-node.pfx -inkey akka-key.pem -in akka-cert.pem -password pass:Test123!

# Create Kubernetes secret
kubectl create secret generic akka-tls-cert \
  --from-file=akka-node.pfx=akka-node.pfx \
  --namespace=drawtogether

Mounting the Certificate in Pods

In your Deployment/StatefulSet spec:

spec:
  template:
    spec:
      containers:
      - name: drawtogether-app
        env:
        - name: AkkaSettings__TlsSettings__Enabled
          value: "true"
        - name: AkkaSettings__TlsSettings__CertificatePath
          value: "/certs/akka-node.pfx"
        - name: AkkaSettings__TlsSettings__CertificatePassword
          value: "Test123!" # This should really be fetched from a secret
        - name: AkkaSettings__TlsSettings__ValidateCertificates
          value: "false"  # Set to true in production with real certs
        volumeMounts:
        - name: tls-cert
          mountPath: /certs
          readOnly: true
      volumes:
      - name: tls-cert
        secret:
          secretName: akka-tls-cert

C# Configuration for TLS

The configuration code automatically detects TLS settings and applies them to Akka.Remote:

private static void ConfigureRemoteOptionsWithTls(AkkaSettings settings)
{
    var tlsSettings = settings.TlsSettings!;
    var remoteOptions = settings.RemoteOptions;

    // Load the certificate from the mounted path
    var certificate = tlsSettings.LoadCertificate();
    if (certificate is null)
        throw new InvalidOperationException(
            "TLS is enabled but no certificate could be loaded");

    // Configure SSL through RemoteOptions
    remoteOptions.EnableSsl = true;
    remoteOptions.Ssl = new SslOptions
    {
        X509Certificate = certificate,
        SuppressValidation = !tlsSettings.ValidateCertificates
    };

    // Update seed nodes to use akka.ssl.tcp:// protocol
    if (settings.ClusterOptions.SeedNodes?.Length > 0)
    {
        settings.ClusterOptions.SeedNodes = settings.ClusterOptions.SeedNodes
            .Select(node => node.Replace("akka.tcp://", "akka.ssl.tcp://"))
            .ToArray();
    }
}

With this configuration, all Akka.Remote communication between cluster nodes uses TLS encryption.

Using Petabridge.Cmd as a Sidecar Container

One of the most useful tools for managing and troubleshooting Akka.NET clusters in Kubernetes is Petabridge.Cmd, our free command-line tool for Akka.NET. We ship a pre-built Docker sidecar container that makes it trivially easy to include in your pods.

Enabling Petabridge.Cmd in Your Application

First, configure Petabridge.Cmd in your Akka.NET application:

builder.Services.ConfigureAkka(builder.Configuration,
    (configurationBuilder, provider) =>
    {
        var options = provider.GetRequiredService<AkkaSettings>();
        configurationBuilder.AddPetabridgeCmd(
            options: options.PbmOptions,
            hostConfiguration: cmd =>
            {
                cmd.RegisterCommandPalette(ClusterCommands.Instance);
                cmd.RegisterCommandPalette(ClusterShardingCommands.Instance);
            });
    });

This binds Petabridge.Cmd to port 9110 (the default) inside your container.

Adding the Sidecar Container

In your Deployment/StatefulSet, add a second container alongside your application:

spec:
  template:
    spec:
      containers:
      - name: pbm-sidecar
        image: petabridge/pbm:latest
      - name: drawtogether-app
        image: drawtogether
        # ... rest of your application config

The sidecar container runs the Petabridge.Cmd client, which connects to localhost:9110 to communicate with your Akka.NET application.

Using Petabridge.Cmd

With the sidecar in place, you can exec into the pod and run Petabridge.Cmd commands:

# Exec into the pbm-sidecar container
kubectl exec -it drawtogether-0 -c pbm-sidecar -- /bin/sh

# Show cluster status
pbm cluster show

# View cluster sharding regions
pbm cluster sharding regions

# Get statistics for a specific shard region
pbm cluster sharding region-stats drawing-session

This is invaluable for troubleshooting and understanding what’s happening inside your cluster without needing to deploy custom monitoring tools.

Graceful Shutdowns and terminationGracePeriodSeconds

When Kubernetes needs to terminate a pod (during a deployment, scale-down, or node drain), you want your Akka.NET application to shut down gracefully. This means:

  1. Stop accepting new work
  2. Finish processing in-flight messages
  3. Hand off cluster sharding responsibility to other nodes
  4. Persist any final state to your database or event journal
  5. Shut down cleanly

Kubernetes provides the terminationGracePeriodSeconds setting to control how much time a pod has to shut down before it’s forcibly killed:

spec:
  template:
    spec:
      terminationGracePeriodSeconds: 35

How This Works:

  1. Kubernetes sends a SIGTERM signal to your application
  2. Your application has terminationGracePeriodSeconds (35 seconds in this example) to shut down
  3. If it doesn’t shut down within that time, Kubernetes sends SIGKILL and forcibly terminates it

Akka.NET’s coordinated shutdown system automatically handles graceful termination when it receives SIGTERM. The sequence looks like this:

sequenceDiagram
    participant K8s as Kubernetes
    participant App as Akka.NET App
    participant Cluster as Cluster Members
    participant DB as Persistence Store

    K8s->>App: Sends SIGTERM
    App->>App: Fails readiness check
    K8s->>K8s: Removes pod from load balancer
    App->>Cluster: Initiates cluster leave
    App->>DB: Persists final state
    Cluster->>Cluster: Rebalances shards
    App->>App: Shuts down actor system
    App->>K8s: Process exits
    Note over K8s: If > 35 seconds, SIGKILL

For Akka.NET applications with cluster sharding, 35 seconds is usually sufficient. If you have very large amounts of state to persist or complex shutdown logic, you might need to increase this value.

Pod Resource Limits and Horizontal Pod Autoscaling

One final consideration: setting appropriate resource requests and limits for your Akka.NET pods, and whether to use Horizontal Pod Autoscaling (HPA).

Resource Requests and Limits

resources:
  requests:
    memory: "512Mi"
    cpu: "500m"      # 0.5 CPU cores
  limits:
    memory: "4Gi"
    # No CPU limit - Akka.NET can be very bursty during recovery

IMPORTANT: These are example values only.

Do NOT copy-paste these into production without profiling your actual workload.

Serious production Akka.NET applications, especially those using Akka.Persistence, Cluster.Sharding, or running A LOT of actors with in-memory state may require substantially more memory. An OOMKill during cluster rebalancing or persistence recovery can trigger cascading failures across your cluster.

Always monitor your actual memory usage in production and set limits with comfortable headroom.

Requests tell Kubernetes how much resources the pod needs to function normally. Kubernetes uses this for scheduling decisions.

Limits define the maximum resources the pod can consume. If a pod exceeds its memory limit, it gets OOMKilled. If it exceeds CPU limits, it gets throttled.

For Akka.NET applications, I generally recommend:

  • Conservative memory requests: Start with what you know you need and monitor actual usage
  • Generous memory limits: Akka.NET applications can temporarily spike memory usage during cluster rebalancing or message bursts
  • Set CPU requests, but avoid CPU limits: Akka.NET can be extremely bursty, especially during pod initialization when actors are recovering state from Akka.Persistence. CPU throttling during these critical periods can cause recovery to take longer, leading to health check failures and potential failure cascades. Either set no CPU limit at all, or set it very high if your cluster policies require one

Horizontal Pod Autoscaling (HPA)

HPA can automatically scale your Akka.NET cluster based on metrics like CPU utilization or custom application metrics. However, be cautious with HPA for stateful Akka.NET applications:

Why HPA Can Be Problematic:

  1. Cluster rebalancing overhead: Every time HPA scales up or down, Akka.Cluster.Sharding has to rebalance shards across the new pod count. Frequent scaling creates constant rebalancing.

  2. Latency during rebalancing: While shards are moving, requests to those entities experience higher latency.

  3. Persistence pressure: If your entities use Akka.Persistence, rebalancing means more entities rehydrating from storage.

When HPA Makes Sense:

  • Your application has predictable traffic patterns (e.g., scale up during business hours, down at night)
  • You use long cooldown periods (e.g., only scale every 10-15 minutes minimum)
  • Your entities are relatively lightweight and cheap to move

Our Approach at Petabridge:

For Sdkbin, we don’t use HPA. We maintain a fixed cluster size that’s provisioned for peak load. We just don’t have high enough traffic to really merit investing in it.

If you do use HPA, configure it conservatively:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: drawtogether-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: drawtogether
  minReplicas: 3
  maxReplicas: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 600  # Wait 10 minutes before scaling down
    scaleUp:
      stabilizationWindowSeconds: 300  # Wait 5 minutes before scaling up
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Putting It All Together

Running Akka.NET in Kubernetes is absolutely viable and, in many ways, easier than alternative orchestration approaches. The key is understanding how Kubernetes features map to Akka.NET’s requirements:

  • Use Deployments over StatefulSets for less cluster churn and easier rollbacks
  • Configure Akka.Management + Akka.Discovery.Kubernetes for automatic cluster formation
  • Set up proper RBAC so pods can discover each other
  • Implement health checks that distinguish between liveness and readiness
  • Use Kubernetes Secrets to secure Akka.Remote with TLS
  • Add the Petabridge.Cmd sidecar for operational visibility
  • Configure terminationGracePeriodSeconds to allow graceful shutdowns
  • Set appropriate resource limits based on your actual workload

All of the code from this post is available in the DrawTogether.NET repository. Clone it, run it locally with Docker Desktop’s Kubernetes support, and experiment with different configurations.

If you’re building production Akka.NET applications in Kubernetes and need help, reach out to Petabridge. We’ve been doing this for years and can help you avoid the mistakes we made along the way.

Further Resources:

  1. Unless you’re using IP-based networking in K8s, in which case publishNotReadyAddresses doesn’t affect anything. 

If you liked this post, you can share it with your followers or follow us on Twitter!
Written by Aaron Stannard on October 24, 2025

 

 

Observe and Monitor Your Akka.NET Applications with Phobos

Did you know that Phobos can automatically instrument your Akka.NET applications with OpenTelemetry?

Click here to learn more.