How We Made Phobos 2.4's OpenTelemetry Usage 62% Faster

OpenTelemetry Performance Optimization Practices

Phobos is our observability + monitoring library for Akka.NET and last year we launched Phobos 2.0 which moved our entire implementation onto OpenTelemetry.

Phobos 2.0 Instruments Akka.NET with OpenTelemetry

One of the issues we’ve had with Phobos: many of our customers build low-latency, real-time applications and adding observability to a software system comes with a noticeable latency + throughput penalty. We were asked by a customer at the beginning of July “is there any way you can make this faster?”

Challenge accepted - and completed.

Earlier this week we shipped Phobos 2.4.0, which is a staggering 62% faster than all of our previous Phobos 2.x implementations - actually Phobos 2.4 is even faster than that for real-world applications and we’ll get to that in a moment.

This blog post is really about optimizing hot-paths for maximum OpenTelemetry tracing and metrics performance and the techniques we used in the course of developing Phobos 2.4.

Let’s dig in.

Historical Benchmark Data

Since we shipped the first version of Phobos back in 2018 we’ve also made “Phobos Performance Impact” data available on our product website - that way users considering purchasing Phobos could set their expectations for how much overhead full Phobos instrumentation (tracing + metrics) adds to their existing user code. This data was generated from a standard NBench macro-benchmark - i.e. live tracing of real actors running in-process.

We mostly use Benchmark.NET for benchmarking our code these days but I find that Benchmark.NET has reproducibility and precision issues with long-running highly concurrent / multi-core macro benchmarks. So for those big benchmarks I still prefer NBench or writing a custom .NET console application to do the measuring for me - as I tend to get results that more closely approximate real-world usage and are fairly reproducible.

We’re also big believers in benchmarking continuity - you should use the same benchmark you’ve been using in order to demonstrate relative improvements or changes in performance. Any changes to that benchmark’s implementation discard historical results - so for both of those reasons we’ve been using the same benchmark to track Phobos’ performance impact for the past five years.

Phobos is a proprietary library so the source code isn’t available but here’s what our primary benchmark looks like:

[PerfBenchmark(
    Description = "Measures the throughput of a fully Phobos instrumented actor",
    RunMode = RunMode.Iterations, NumberOfIterations = 13, TestMode = TestMode.Measurement,
    RunTimeMilliseconds = 1000)]
[CounterMeasurement(MailboxCounterName)]
[GcMeasurement(GcMetric.TotalCollections, GcGeneration.AllGc)]
[MemoryMeasurement(MemoryMetric.TotalBytesAllocated)]
public void Instrumented_Throughput(BenchmarkContext context)
{
    for (var i = 0; i < MailboxMessageCount;)
    {
        _fullyInstrumentedActor.Tell(dataExample);
        _fullyInstrumentedActor.Tell(intExample);
        _fullyInstrumentedActor.Tell(stringExample);
        ++i;
    }

    _mailboxThroughput.Increment(_fullyInstrumentedActor.Ask<int>(Collect.Instance, TimeSpan.FromSeconds(90))
        .Result);
}

We’re measuring the cumulative throughput of a single actor processing 3_000_000 messages, a third of which are a real type, a third of which are a value type, and a third of which are a simple string - so a blend of different potential messages to trace and monitor.

All benchmark figures in this article were gathered from my Intel NUC 12 Pro with the following specs:

BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1992/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1260P, 1 CPU, 16 logical and 12 physical cores
.NET SDK 7.0.302
  [Host]     : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2

I should also point out that none of these benchmarks were recorded with .NET 7’s Dynamic PGO enabled, which would cause these numbers to improve even more.

Phobos 2.3.1 Performance

Here is our performance as of Phobos 2.3.1, measured in terms of number of messages per second a single actor can process while various tracing / monitoring settings are enabled or disabled:

Benchmark	Max (r/s)	Average (r/s)	Min (r/s)	StdDev (r/s)
Tracing+Metrics	1,263,278	1,145,856	1,088,398	70,099
Tracing	1,952,431	1,871,857	1,715,370	76,396
Metrics	2,670,549	2,585,137	2,519,923	46,618
Uninstrumented	5,870,611	5,651,058	5,356,407	145,742

Our baseline is our “uninstrumented throughput” - that’s the number of messages per second a single Akka.NET actor can process when Phobos is installed (which adds some overhead on top of Akka.NET) but not actually enabled.

Our target is to improve our fully instrumented “Tracing + Metrics” number - which is when Phobos is capturing the most possible information and is also the most resource intensive. You can also see via these initial numbers that tracing is demonstrably more expensive than monitoring.

Profiling

Phobos does a lot of things under the covers: automatically creates relevant spans and metrics, filters out irrelevant ones, injects telemetry behind the scenes of each Akka.NET actor, and uses a configuration system to help fine-tune where telemetry is needed and where it’s not.

Most of Phobos’s value comes from being able to capture all of this data automatically without any real instrumentation needed on the part of the end-user - however, that automation comes with a cost: Phobos has to sit in the middle of the messaging hot-path and record traces + metrics, which is going to inevitably impact application performance. Every OpenTelemetry instrumentation library has this issue including ASP.NET and others.

To understand where to begin our optimization we need to profile the hotpath and see where the CPU is busiest - we use JetBrain’s dotTrace integration in Rider for this.

Phobos 2.3.1 dotTrace profiling results

Not what I expected - according to the CPU profiler, it’s our OpenTelemetry metrics API calls that are the biggest resource hogs… Even though the tracing system is demonstrably, significantly slower in our benchmark? Ok - we’ll optimize those first and retest.

OpenTelemetry.Metrics Optimization

Two OpenTelemetry.Metrics optimizations we’ve used from the very beginning of Phobos 2.0:

Use “bound” metrics - store all of your Counters, Histograms, et al as singletons and re-use them constantly. This avoids the overhead of having to constantly create and dispose frequently used metric objects.
Use TagList to pass in metric dimensions - System.Diagnostics.TagList is a struct that allows you to store up to 8 KeyValuePair<string,object> “tags” that you can use on any given metric instance. In Phobos it’s how we record dimensions like “which actor type processed this message” or “which type of message was processed?” - and this data can, in turn, be aggregated into dashboards on platforms like Grafana to visualize actor message throughput by actor type / message type.

The TagList data structure is very cleverly designed to prevent list or array allocations (as long as you don’t exceed 8 tags) - and has been heavily performance-optimized by Microsoft already.

Based on our profiling data though - the way we were creating tags was quite expensive. How could that be?

Optimizing `TagList` Creation & Propagation

Our call sites managing actor message-processing metrics looks like this:

private Option<TimerContext> RecordMetrics(object message, long? latencyStart = null)
{
    var timer = Option<TimerContext>.None;
    if (InstrumentationSettings.Monitor)
    {
        var metricTags = MetricTags.WithMessageClass(message.GetType().ToCleanName()).ToTags(); // high perf impact

        timer = Metrics.MessageReceiveLatency.CreateTimer(metricTags,
            latencyStart ?? TimestampHelper.GetTimestamp());

        RecordMsgMetrics(metricTags);
    }

    return timer;
}

private void RecordMsgMetrics(TagList metricTags)
{
    Metrics.MsgsReceived.Add(1, metricTags);

    // record mailbox depth if it's configured
    if (InstrumentationSettings.MonitorMailbox)
    {
        Metrics.MailboxDepth.Record(Mailbox.MessageQueue.Count, MetricTags.ToTags());
    }
}

The line that was getting us into trouble was the .ToTags() call, which produces a brand new TagList each time it’s called:

public TagList ToTags()
{
    var tagList = new TagList();
    
    if (RoleTag.HasValue)
    {
        tagList.Add(RoleTag.Value);
    }

    if (AddressTag.HasValue)
    {
        tagList.Add(AddressTag.Value);
    }

    if (ActorClassTag.HasValue)
    {
        tagList.Add(ActorClassTag.Value);
    }

    if (MessageTypeTag.HasValue)
    {
        tagList.Add(MessageTypeTag.Value);
    }
    
    return tagList;
}

Why is this so expensive? As it turns out I made some fundamental attribution errors about how TagLists can be shared safely across many concurrent invocations of a Metric.Record call:

TagList is a struct and therefore it follows copy-on-write semantics: there’s no need for me to worry about mutability across multiple asynchronous method calls - I don’t need to create a new TagList every time;
Because of those same copy-on-write semantics - passing the same struct across multiple methods results in unnecessary copying within the stack.

These are both very fixable problems, but first - let’s measure the impact of mis-using TagList in 2.3.1:

Method	Mean	Error	StdDev	Allocated
ToTagsWithActorClass	42.64 ns	0.853 ns	0.798 ns	-
ToTagsWithActorClassAndMessageType	50.30 ns	0.974 ns	1.082 ns	-

50ns is pretty high, considering we’re not even allocating any memory on the heap. I wonder what the aggregate impact is on the component that calls RecordMetrics?

Method	Iterations	MonitoringEnabled	TracingEnabled	Mean	Error	StdDev	Req/sec	Gen0	Allocated
ReceiveMessageWithoutTrace	10000	False	False	69.37 ns	1.092 ns	0.912 ns	14,414,490.04	-	-
ReceiveMessageWithTrace	10000	False	False	84.40 ns	1.723 ns	1.611 ns	11,848,659.18	-	-
ReceiveMessageWithoutTrace	10000	False	True	324.63 ns	6.237 ns	7.183 ns	3,080,409.80	0.0243	232 B
ReceiveMessageWithTrace	10000	False	True	319.70 ns	5.974 ns	5.588 ns	3,127,930.86	0.0243	232 B
ReceiveMessageWithoutTrace	10000	True	False	226.81 ns	4.211 ns	3.733 ns	4,408,952.03	-	-
ReceiveMessageWithTrace	10000	True	False	228.39 ns	4.402 ns	5.567 ns	4,378,517.17	-	-
ReceiveMessageWithoutTrace	10000	True	True	499.49 ns	9.815 ns	10.909 ns	2,002,030.55	0.0238	232 B
ReceiveMessageWithTrace	10000	True	True	463.39 ns	8.739 ns	8.174 ns	2,158,032.08	0.0243	232 B

When just metrics are enabled the entire operation takes roughly 228.4ns - so ToTags is taking up roughly 1/5th of that time.

How to fix this? First, you’ll notice that all of the built-in System.Diagnostics.Instrument<T> (counters, histograms, et al implement this class) methods look like this:

public void Add(T delta, in TagList tagList) => RecordMeasurement(delta, in tagList); // notice the `in` keyword

To eliminate all of the unnecessary copying of the TagList we can do what OTEL metrics does and decorate our methods that consume TagLists to also use the in keyword:

private void RecordMsgMetrics(in TagList metricTags)
{
    Metrics.MsgsReceived.Add(1, metricTags);

    // record mailbox depth if it's configured
    if (InstrumentationSettings.MonitorMailbox)
    {
        Metrics.MailboxDepth.Record(Mailbox.MessageQueue.Count, metricTags);
    }
}

That eliminates 1 unnecessary round of copying right there. Low-hanging fruit. But what about the unnecessary creation of a new TagList every single time?

public BuiltInTags(Option<KeyValuePair<string, object>> roleTag,
    Option<KeyValuePair<string, object>> addressTag, 
	Option<KeyValuePair<string, object>> actorClassTag)
{
    RoleTag = roleTag;
    AddressTag = addressTag;
    ActorClassTag = actorClassTag;
    
    var tags = new TagList();

    if(AddressTag.HasValue)
        tags.Add(AddressTag.Value);
    if(RoleTag.HasValue)
        tags.Add(RoleTag.Value);
    if(ActorClassTag.HasValue)
        tags.Add(ActorClassTag.Value);

    Tags = tags;
}
private TagList Tags { get; }

public TagList WithMessageClass(string messageClassName)
{
    var messageTypeTag = new KeyValuePair<string, object>(MessageTypeLabel, messageClassName);
    var tags = ToTags();
    tags.Add(messageTypeTag);
    return tags;
}

Now we’re making struct’s copy-on-write semantics work for us - we don’t allocate new TagLists on-demand anymore: returning the TagList from the ToTags() method automatically creates a new copy for us and we don’t have to worry about mutability when passing this TagList into all of our subsequent calls!

What’s the performance impact?

Method	Mean	Error	StdDev	Req/sec	Allocated
ToTagsWithActorClass	24.17 ns	0.384 ns	0.360 ns	41,365,333.48	-
ToTagsWithActorClassAndMessageType	30.66 ns	0.633 ns	0.528 ns	32,616,520.86	-

From 50ns to 30s - that’s a 40% improvement! Not bad for a little struct shuffling.

So what’s the aggregate improvement look like in Phobos’ metrics processing?

Method	Iterations	MonitoringEnabled	TracingEnabled	MessageKind	Mean	Error	StdDev	Req/sec	Gen0	Allocated
ReceiveMessageWithoutTrace	10000	True	False	Primitive	152.27 ns	2.953 ns	3.282 ns	6,567,116.76	-	-
ReceiveMessageWithTrace	10000	True	False	Primitive	148.97 ns	2.987 ns	2.933 ns	6,712,556.62	-	-
ReceiveMessageWithoutTrace	10000	True	False	Class	149.32 ns	2.871 ns	2.685 ns	6,696,818.83	-	-
ReceiveMessageWithTrace	10000	True	False	Class	150.60 ns	3.000 ns	2.946 ns	6,640,183.19	-	-
ReceiveMessageWithoutTrace	10000	True	False	Record	153.54 ns	2.762 ns	2.584 ns	6,513,013.88	-	-
ReceiveMessageWithTrace	10000	True	False	Record	150.68 ns	2.945 ns	2.755 ns	6,636,731.49	-	-

We went from ~225ns to ~150ns in most cases, a 33% improvement! That aggregate improvement was the result of some other non-OTel changes too (i.e. avoiding copying of other types of structs just like the TagList; caching frequently-used strings) but it all stems from the same place: in tight loops that are frequently executed, you still have to be careful about stack allocations - use the in parameter generously.

OpenTelemetry.Tracing Optimizations

While our profiler showed that metrics calls that consumed the most CPU, we knew that this was misleading - and that’s part of the balancing act of performance management: data is contextual.

Tracing is demonstrably more expensive than metrics because it’s significantly more resource-intensive:

Requires explicit propagation of TelemetrySpans / SpanContext / Baggage between local and remote calls;
Stringification of important contextual data, such as logs / attributes / and events; and
Always requires actual heap allocations that have to be GCed whereas OpenTelemetry.Metrics can be done with zero allocations.

So why didn’t our CPU profiling show that trace methods were generally more expensive? This is because most of the expense was tied up by object allocations and garbage collection - which don’t show up as easily identifiable user-method-calls in the profiler.

NOTE: if you want to learn more about this type of subject matter, please see my talk on “.NET Systems Programming Learned the Hard Way.”

Given that important systems programming context we need to:

Defer allocations so they happen outside the hotpath (i.e. occur inside the OpenTelemetry processing pipeline instead) or
Prevent allocations from happening if they are not necessary.

Both of these changes improve performance around the hot-path - what does that look like in Phobos 2.3.1?

private TelemetrySpan CreateActiveSpan(object underlyingMsg, SpanContext? context = null,
    Baggage baggage = default, bool followFromActiveSpan = true, long? startTimeUtcTicks = null)
{
    CompleteLifespanScope();

    var startTime = startTimeUtcTicks != null
        ? new DateTimeOffset(startTimeUtcTicks.Value, TimeSpan.Zero)
        : DateTimeOffset.UtcNow;

    TelemetrySpan sp = null;
    if (context == null)
    {
        sp = Tracer.StartRootSpan(underlyingMsg.GetType().GetOperationName(),
            SpanKind.Consumer, InitialAttributes, startTime: startTime);
            
        Tracer.WithSpan(sp); // mark the current span as active
    }
    else
    {
        sp = Tracer.StartActiveSpan(underlyingMsg.GetType().GetOperationName(),
            SpanKind.Consumer, parentContext: context.Value, InitialAttributes, startTime: startTime);
    }

    // Restore the stored baggage
    Baggage.Current = baggage;
    sp.SetAttribute(MsgSenderTagName, (Sender ?? ActorRefs.Nobody).Path.ToStringWithAddress())
        .SetAttribute(MessageTagName, underlyingMsg.GetType().ToCleanName());

    // Added to resolve https://github.com/petabridge/phobos-issues/issues/59
    if (!InstrumentationSettings.LogMessageEvents) return sp;
    if (startTimeUtcTicks != null)
    {
        sp.AddEvent("waiting", startTime);
    }
    var attributes = new SpanAttributes();
    attributes.Add("content", underlyingMsg.ToString());
    sp.AddEvent("message", attributes);

    return sp;
}

Where do allocations occur inside this method?

Creating the original TelemeterySpan in the Tracer.StartRootSpan or Tracer.StartActiveSpan method - there’s nothing we can do to prevent this as it’s required for tracing.
Both of the SetAttribute calls involve two allocations each: expanding the attributes collection size internally AND the “stringification” of the data type.
The final AddEvent call - includes the creation of a new SpanAttributes object and the underlyingMsg.ToString() call.

Deferred `string`-ification

We made some major performance optimizations on Akka.NET v1.5 around deferring stringification of logging messages - given that, eliminating the underlyingMsg.ToString() seemed like the best target.

Some relevant numbers from our CreateActiveSpan benchmark for different types of messages an actor might process:

Method	MessageKind	Mean	Error	StdDev	Req/sec	Gen0	Allocated
CreateNewRootSpan	Primitive	119.1 ns	2.17 ns	2.03 ns	8,399,591.08	0.0246	232 B
CreateChildSpan	Primitive	123.8 ns	2.47 ns	2.42 ns	8,076,859.69	0.0246	232 B
CreateChildSpanWithBaggage	Primitive	122.3 ns	2.18 ns	2.04 ns	8,179,444.13	0.0246	232 B
CreateNewRootSpan	Class	164.4 ns	2.80 ns	2.62 ns	6,081,681.47	0.0348	328 B
CreateChildSpan	Class	170.2 ns	1.94 ns	1.72 ns	5,873,753.13	0.0348	328 B
CreateChildSpanWithBaggage	Class	169.6 ns	3.17 ns	2.97 ns	5,896,706.30	0.0348	328 B
CreateNewRootSpan	Record	222.3 ns	4.48 ns	5.50 ns	4,497,668.25	0.0730	688 B
CreateChildSpan	Record	222.9 ns	4.27 ns	4.75 ns	4,485,591.41	0.0730	688 B
CreateChildSpanWithBaggage	Record	226.7 ns	2.80 ns	2.48 ns	4,411,065.79	0.0730	688 B

Memory allocation AND observed latency scale with the type of message! This confirms what we already suspected - underlyingMsg.ToString() is the biggest resource hog. C# records have notoriously slow ToString methods because they use a StringBuilder to pretty-print all of the child members internally. By constrast - our class type just uses simple string interpolation in an overridden ToString() implementation and is significantly faster.

If we can defer this expensive .ToString() invocation so it happens outside of the hotpath that would be an enormous win for all Phobos users. How could we accomplish this?

As it turns out - there’s support for performing this type of “deferring stringification” in System.Diagnostics.Activity, the base class library upon which .NET’s OpenTelemetry.Trace APIs are built!

OpenTelemetry tracing pipeline

If you avoid converting things to strings in your hot-path, the OTel processor + exporter pipeline will do it automatically during trace export - this can move expensive allocations from performance-sensitive areas of your application to the out-of-band processing OpenTelemetry does anyway without any data loss.

So how to do that with the TelemetrySpan and SpanAttributes?

// rest of  CreateActiveSpan method
static IEnumerable<KeyValuePair<string, object>> CreateSpanAttributes(object msg)
{
    yield return new KeyValuePair<string, object>("content", msg);
}

var attributes = new SpanAttributes(CreateSpanAttributes(underlyingMsg));
sp.AddEvent("message", attributes);

return sp;

Unfortunately there’s no SpanAttributes.SetAttribute(string,object) method available in OpenTelemetry as of 1.5.1, although I submitted a pull request to add one here](https://github.com/open-telemetry/opentelemetry-dotnet/pull/4672). Therefore - in order to defer string-ification you have to pass an IEnumerable<KeyValuePair<string,object>> into the SpanAttributes constructor, which we did here using a static local function.

So what’s the impact?

Method	MessageKind	Mean	Error	StdDev	Req/sec	Gen0	Allocated
CreateNewRootSpan	Primitive	152.0 ns	2.58 ns	2.42 ns	6,580,292.48	0.0322	304 B
CreateChildSpan	Primitive	149.7 ns	1.98 ns	1.85 ns	6,681,595.57	0.0322	304 B
CreateChildSpanWithBaggage	Primitive	150.3 ns	2.71 ns	2.53 ns	6,653,843.06	0.0322	304 B
CreateNewRootSpan	Class	150.1 ns	2.74 ns	2.29 ns	6,660,638.95	0.0322	304 B
CreateChildSpan	Class	149.1 ns	2.38 ns	2.11 ns	6,704,915.73	0.0322	304 B
CreateChildSpanWithBaggage	Class	151.2 ns	2.87 ns	2.69 ns	6,613,827.37	0.0322	304 B
CreateNewRootSpan	Record	147.8 ns	2.80 ns	2.62 ns	6,766,604.10	0.0322	304 B
CreateChildSpan	Record	147.0 ns	2.75 ns	2.57 ns	6,801,670.75	0.0322	304 B
CreateChildSpanWithBaggage	Record	149.7 ns	2.87 ns	2.95 ns	6,678,899.59	0.0322	304 B

The performance for tracing primitive data types, such as string, became slightly worse (120ns to ~150ns) and allocates slightly more memory (304B instead of 232B) - but all of the other test cases perform exactly the same now, meaning that this change was absolutely effective.

Creating a TelemetrySpan for a record now takes only 150ns instead of 225ns, a 33% improvment. We also use 56% less memory than we did before to boot.

Other Notable Performance Mentions

Some other notable mentions for improving performance around OpenTelemetry:

For complex value (struct) types like SpanContext or DateTimeOffset - you’re better off passing around a SpanContext? c = null for optional parameters than doing a SpanContext c = default. The reason for this is that the value-based comparison is significantly slower than the reference-based comparison (a null check) when checking to see if this value was specified by the caller or not.
Be careful with Baggage method calls - even simple calls like Baggage.ClearBaggage() allocate structs each time you call them. Use these methods only when absolutely necessary.
For other areas where string-ification is necessary AND the data has low cardinality (repeats often), use caching to re-use those same strings over and over again.

Final Results

There were other non-OpenTelemetry fixes we made to Phobos 2.4’s performance as well, such as improvements to our trace filtering system, but the OTEL methods themselves comprised the lion’s share. So what were our final results?

Benchmark	Max (r/s)	Average (r/s)	Min (r/s)	StdDev (r/s)
Tracing+Metrics	1,911,285	1,879,102	1,837,075	22,929
Tracing	2,470,909	2,418,495	2,374,342	28,270
Metrics	4,051,242	3,945,305	3,873,390	49,505
Uninstrumented	6,979,419	6,117,389	5,739,619	370,057

Our uninstrumented throughput in 2.3.1 was 5,651,058 r/s (requests per second) and is now 6,117,389 r/s, so even that was about ~10% faster!

But the fully instrumented throughput, tracing + metrics, is what we care about:

2.3.1: 1,145,856 r/s
2.4.0: 1,879,102 r/s

An improvement of roughly ~62%! Excellent! All in all this took about 20 hours of work to do - well worth it for our customers and for ourselves.

At the beginning of this article we mentioned:

Phobos 2.4.0, which is a staggering 62% faster than all of our previous Phobos 2.x implementations - actually Phobos 2.4 is even faster than that for real-world applications, but we’ll get to that in a moment.

Reason why I mentioned that: our benchmark is historically 66% primitive types / 33% user-defined classes. Those 66% primitive types used to benchmark significantly faster than classes due to the aforementioned string-ification - now they all run at the same rate. User applications, on the other hand, are typically 100% user-defined classes - therefore our benchmark doesn’t fully capture how much faster Phobos 2.4. runs for them! Even better!

If you have any questions, please leave them in the comments!

If you liked this post, you can share it with your followers or follow us on Twitter!

Written by Aaron Stannard on July 21, 2023

Read more about:
Akka.NET
Case Studies
Videos

Observe and Monitor Your Akka.NET Applications with Phobos

Did you know that Phobos can automatically instrument your Akka.NET applications with OpenTelemetry?

Click here to learn more.

Beyond HTTP

How We Made Phobos 2.4's OpenTelemetry Usage 62% Faster

OpenTelemetry Performance Optimization Practices

Historical Benchmark Data

Phobos 2.3.1 Performance

Profiling

OpenTelemetry.Metrics Optimization

OpenTelemetry.Tracing Optimizations

Deferred `string`-ification

Other Notable Performance Mentions

Final Results

Observe and Monitor Your Akka.NET Applications with Phobos

Petabridge

Beyond HTTP

How We Made Phobos 2.4's OpenTelemetry Usage 62% Faster

OpenTelemetry Performance Optimization Practices

Historical Benchmark Data

Phobos 2.3.1 Performance

Profiling

OpenTelemetry.Metrics Optimization

Optimizing TagList Creation & Propagation

OpenTelemetry.Tracing Optimizations

Deferred string-ification

Other Notable Performance Mentions

Final Results

Observe and Monitor Your Akka.NET Applications with Phobos

Petabridge

Optimizing `TagList` Creation & Propagation

Deferred `string`-ification