How Distributed Tracing Solves One of the Worst Microservices Problems

Distributed Tracing Solves Some Big Pain Points with Microservices

Historically most web applications were developed as monolithic architectures. The entire application shipped as a single process implemented on a single runtime. Ultimately this architectural choice makes scaling software development extremely painstaking and tedious, because 100% of code changes submitted by members of any development team target a single code base. Under monolithic designs deployments are executed as “all or nothing” affairs - either you deploy all parts of your application at once or you deploy none of them.

It’s for painful reasons like this that the industry is moving away from monoliths and towards distributed architectures such as microservices. What microservices provide isn’t product scalability; they provide people scalability - the ability to easily partition your software development organization along the same lines as you partition your application with service boundaries.

An organization partitioned along its microservice boundaries

DevOps Implications of Microservices

Microservices provide enormous agility and flexibility to software development organizations. By partitioning our large applications into interdependent services which communicate via explicit network communication contracts, each team encapsulates their implementation from the others. This makes it possible, in theory, for each team to choose the best tools for the job - if Service 1’s requirements are best satisfied using Node.JS and Redis but Service 2’s are better handled with .NET and SQL Server, both of these teams can make those choices and develop / deploy their services independently of each other.

In practice, microservices are really a trade off for one set of organizational and technical problems for others. While the benefits of microservices amount to greater independence; clearer organizational boundaries and division of labor; and greater agility those benefits come with some distinct costs:

  1. Loss of coherence - now that the work to fulfill a single end-user request is now broken...

Introducing Phobos: Enterprise DevOps Suite for Akka.NET

Zero-code Actor Monitoring and Cross-Cluster Request Tracing

Phobos logo

Today it’s my pleasure to announce the production-ready release of Phobos, an enterprise DevOps suite for Akka.NET developers.

The initial release of Phobos is primarily aimed at solving the following problems:

  1. Instrumenting and monitoring activity from actors inside large Akka.NET applications and exporting it to common, off-the-shelf monitoring tools used by .NET enterprises;
  2. Implementing OpenTracing protocols behind the scenes and over the network stack of Akka.NET, so we can provide end-to-end distributed tracing for Akka.NET actors; and
  3. Doing all of the above automatically without any instrumentation code or setup of any kind. Phobos can be installed into any existing Akka.NET application and capture all of this data with as little as 5 lines of HOCON configuration.

I’d encourage you to watch the “Introduction to Phobos: Enterprise DevOps for Akka.NET” video we put together which explains this in more detail. It’s only about 5 minutes long.

Integration with Third Party Monitoring and Tracing Services

Phobos is designed to act primarily as instrumentation for your Akka.NET applications; it ships all of the metrics and trace data it records to off-the-shelf monitoring and tracing products. This includes a variety of open source and proprietary tools chosen by our customers and users. We know how hard it can be to introduce new tools inside large enterprise environments, so our goal is to be able to support whatever you and your organization currently use.

And if you’ve never used something like Zipkin or StatsD, we’ve collected some docker-compose scripts you can use to test drive those technologies (with or without Phobos) here.

Trying and Using Phobos

Phobos has been in the works for about a year and we’ve had...

Akka.Cluster is one of the most popular and useful parts of the Akka.NET ecosystem as a whole, but it’s also one of the most concept-heavy areas. We have a lot of literature on both the official Akka.NET documentation and elsewhere on our blog about concepts such as distributing state in Akka.Cluster, sharding data across cluster nodes using Akka.Cluster.Sharding, publishing messages across a cluster, and so on; however, that barely scratches the surface on the possibilities and uses of Akka.Cluster.

So my goal with this post is to provide a bit of an FAQ on some of the most important and central concepts needed to build and operate effective Akka.NET clusters.

Node Reachability vs. Membership

In Akka.Cluster there are two important, similar-looking concepts that every end-user should be able to distinguish:

  1. Node reachability - is this node available right now? Can I connect to it?
  2. Node availability - is this node a current member of the cluster? Is this node leaving? Joining? Removed?

When many users start working with Akka.Cluster, they operate from the assumption that these two concepts are the same. “If I kill a process that is part of an Akka.NET cluster, that process will no longer be part of the cluster.”

This assumption is incorrect and there’s an important distributed computing concept at work behind this distinction: partition tolerance.

In terms of the CAP theorem, Akka.Cluster provides an AP experience out of the box; Akka.Cluster developers typically trade away some of the cluster’s default availability and partition tolerance (A & P) in exchange for consistency in areas where their domains demand it.

Akka.Cluster’s partition tolerance abilities come from this “reachability” feature; in order to tolerate partitions you have to know where they are and what resources are affected by them...