OpenTelemetry vs. OpenTracing and the Future of Observability in .NET
In the past we’ve discussed why distributed tracing is becoming commonplace and the OpenTracing standard for instrumenting libraries and applications. In this post I want to touch on the emerging OpenTelemetry standard, which will become a common component used to instrument ASP.NET Core applications in the not too distant future.
What is OpenTelemetry?
- OpenTracing - developed by a community of APM vendors and library authors and
- OpenCensus - developed by Google.
The goal is to provide a unified set of APIs library authors can include inside their applications in order to:
- Propagate distributed tracing context, including the new W3C HTTP tracing standard;
- Aggregate metrics (counters, meters, etc); and
- Export metrics and trace data to a variety of different Application Performance Monitoring (APM) backends, which can be configured entirely by the application developer.
OpenTelemetry vs. OpenTracing
So what are the material differences between OpenTelemetry and OpenTracing? Why do we need another new standard?
The major technical differences are:
- OpenTelemetry’s core library is the
Tracerimplementation - the traces are created and correlated using OpenTelemetry calls and then only during the export process do the traces hit any vendor-specific code. This makes the performance of OpenTelemetry very consistent regardless of what vendors end-users choose. In contrast, with OpenTracing all of the real calls are done by a vendor-specific implementation of the OpenTracing APIs - so as a library author you could have a great set of benchmarks using a Zipkin OpenTracing library but not-so-great ones using a Jaeger OpenTracing library. I prefer OpenTelemetry’s approach here.
- OpenTelemetry supports metrics instrumentation in addition to tracing - a library author can record counter and meter (gauge) metrics alongside distributed tracing without needing to instrument a second library. OpenTracing never supported this.
- OpenTelemetry makes it easy to export metrics and traces to multiple backends - since the act of creating traces is now decoupled from the act of exporting them to reporting and aggregation services. This was not something that was easy to do in OpenTracing.
- OpenTelemetry supports some helpful features for really busy / high througput systems, such as pre-aggregation - and in general it has a highly programmable processing pipeline.
OpenTelemetry is backwards-compatible with OpenTracing through its shim layer; this is because OpenTracing has significant adoption whereas OpenTelemetry is still quite new and very much a “work in progress.”
Overall, I think OpenTelemetry is a significant improvement on its technical merits and I strongly prefer its design over the OpenTracing standard. I look forward to being able to adopt it one day.
The Current State of OpenTelemetry in .NET
We recently did a major proof of concept with OpenTelemetry’s .NET drivers and rewrote our Phobos APM library for Akka.NET to use it as a full-stop replacement for our built-in metrics collection and tracing system.
OpenTelemetry’s .NET packages make it abundantly clear through their use of semantic versioning: these driver’s are all currently in alpha state and the APIs will be subject to change. Regardless, we wanted to evaluate its fitness for replacing our OpenTracing implementation and our homespun metrics instrumentation for Phobos.
We were ultimately able to produce some complex traces inside noisy Akka.NET clusters and in areas where we had both Akka.NET + ASP.NET Core integration occurring.
However, we ultimately decided that we can’t adopt OpenTelemetry in its current state due to the following reasons:
- The OpenTelemetry API surface area is about to be entirely redone - moving the core implementation away from the
TelemetrySpanclass in the
OpenTelemetry.Apipackage and into the
Activityclass in the .NET base class library package -
System.Diagnostics. There’s a lot of this, but the bottom line is that adopting the package now will likely require us to rewrite 100% of our instrumentation in the near future. That’s not something we’d want to force onto our users, so we’re going to need to delay our adoption there.
- Metrics support is a second-class citizen in .NET’s OpenTelemetry implementation - virtually all of the content around OpenTelemetry is focused on tracing and there’s really only one exporter for OTel metrics right now: Prometheus. There’s also not much in the way of automatic metrics implementation for ASP.NET, HttpClient, SqlClient - even though all of those enjoy built-in tracing support.
- Underlying performance and correlation issues - I don’t know quite what the issue is exactly, but I managed to bring down my local K8s cluster multiple times trying to export spans to Jaeger from an application with a fairly low amount of traffic on it. I’ve not had this issue with our OpenTracing implementation. There could be a lot of explanations for this, but our application / library instrumentation changed very little.
- Concerns about extensibility - a big area where I’m worried about the future of OpenTelemetry in .NET is it becoming much less extensible as a function of its becoming part of the .NET BCL. For instance, when correlating traces for Akka.NET actors it’s often not appropriate for us to use an
AsyncLocalcontext because actors don’t get scheduled that way. We need an alternative location to stick our spans in order for correlation to work. OpenTracing supported that use case through its
IScopeManagerabstraction; the current OpenTelemetry library somewhat supports that through its
DistributedContextCarrierabstraction; but since the plan is to eventually remove OpenTelemetry’s APIs in order to depend directly on the .NET BCL, there may not be a replacement for that in the future. I weighed in on the API discussion for that on the OpenTelemetry .NET Github here.
All in all, I think the OpenTelemetry efforts are very promising and exciting for .NET application developers and library authors alike, but we might still be a year or more away from a stable library.
What We Decided to Do
Given how our OpenTelemetry proof of concept went, here’s what we decided to do with our Phobos product for its 1.0.0 release:
- Stick with our OpenTracing implementation for tracing - but make all of the configuration occur in code rather than in HOCON;
- Replace our home-grown metrics instrumentation with App.Metrics - a popular .NET OSS metrics instrumentation library that is performant and capable of supporting many reporting back-ends, such as DataDog, Prometheus, Application Insights, InfluxDb, and more.
We look forward to following OpenTelemetry’s progress and are very much invested in the project’s success long-term.If you liked this post, you can share it with your followers or follow us on Twitter!
Upcoming Petabridge Live Akka.NET Webinar Trainings
Get up to speed on the leading edge of large-scale .NET development with the Petabridge team. Each training is done remotely via webinar, lasts four hours, and will save you weeks of trial and error.
|Akka.NET Application Architecture and Design Patterns|
|Building Networked .NET Applications with Akka.Remote|
|.NET Distributed Systems Architecture and Design with Akka.Cluster|