A concept I’ve been trying to put a name to is software application architectures that inherently lend themselves to a lower rate of technical debt accumulation than others. This post is my first attempt to do that.

Technical Debt

Technical debt is a term that is used frequently in our industry and while its meaning is commonly understood among experienced technologists, it’s not clearly defined.

In the abstract:

  • Technical debt is cost incurred from software design and implementation choices made in the past;
  • Interest on that technical debt accrues and compounds as a result of subsequent decisions that are layered onto the original set of choices; and
  • The full cost of technical debt is not known until, at some point in the future, there is a need to modify the software system which forces the development team to modify the original choices built into the existing system and calculate the level of effort needed to change them safely.

Here’s what makes technical debt so complicated to pin down - its real cost depends upon what might happen later in the future, due to changing circumstances, environments, business requirements, and so on. These can be really difficult to anticipate at the onset of a greenfield software project even for well-intentioned, experienced, and disciplined programming teams.

Example: Database-Driven Development

One the most painful experiences of my software development career was at my last startup, MarkedUp, where we had to migrate off of RavenDb and onto Apache Cassandra under dire circumstances in early 2013.

Our service had only been live for maybe 45 days, but in that time we’d successfully acquired a ton of new users in a very short period of time. Way beyond our most optimistic expectations - consecutive, compound 200-400% activity growth over several 3-4 days. Going from about 10k events per day to about 5-8 million. And even though we’d thoroughly tested our software, there wasn’t much data available on how RavenDb’s MapReduce indicies would perform over time. We were early adopters.

As we discovered, RavenDb collapsed under a modest amount of traffic, 30 writes per second or so, and the MapReduce indicies we’d used to power many of our “real-time” analytics would simply fail to update for days at a time. The only solution we found to fix the indicies was to continuously migrate from one very large EC2 instance to another using a homegrown migration tool since Raven’s would collapse under load. That would give the indicies a chance to catch up on the new system as newer data got migrated first - and is more valuable in a real-time analytics system.

NOTE: RavenDb is assuredly a better technology now, but it was total disaster back then. Don’t judge RavenDb today by how it performed back in early 2013.

But where the technical debt came into the picture: I listened to some of RavenDb creator Ayende’s advice and took it to heart. From his post, “Repository is the new Singleton

So, what do I gain by using the repository pattern when I already have NHibernate (or similar, most OR/M have matching capabilities by now)?

Not much, really, expect as additional abstraction. More than that, the details of persistence storage are:

  • Complex
  • Context sensitive
  • Important
  • Trying to hide that behind a repository interface usually lead us to a repository that has method like:

  • FindCustomer(id)
  • FindCustomerWithAddresses(id)
  • FindCustomerWith.. It get worse when you have complex search criteria and complex fetch plan. Then you are stuck either creating a method per each combination that you use or generalizing that. Generalizing that only means that you now have an additional abstraction that usually map pretty closely to the persistent storage that you use.

From my perspective, that is additional code that doesn’t have to be written.

So that’s exactly what we did - embedded RavenDb-specific driver code in all of our HTTP methods, ingestion API, and so forth. No more of the classic IRepository pattern for us. The moment we decided to do that was the moment where I ended up, unwittingly, choosing to spend hundreds of thousands of investor dollars migrating off of RavenDb versus a much smaller number had I made some different choices.

Let’s separate two components of the cost:

  1. The cost of migrating current data from one database to another - technical debt but in a class of its own. Even between two identical T-SQL systems there is an inescapable cost to migrating data from one instance to another. You have to create a process to do the transform, account for errors, re-transmit the failed parts, eliminate duplicates, and so on.
  2. The cost of implementing application’s current read / write models - this is where the technical debt primarily accumulates in this scenario.

The repository pattern is basically an abstraction designed to create persistence ignorance - a principle long-recommended in the early to late 2000s for allowing application developers to create and test their business logic and their persistence models independently. It has, at least in .NET Twitterland, fallen out of favor and been replaced with exactly what we did at MarkedUp: programming directly against the first party features of the database.

The benefits of doing this are, as explained in the blog above, that it’s easier to implement complex read/write models and it’s one less layer of abstraction that needs to be written and maintained.

But as we discovered, the downsides to developing directly against the database are massive from a technical debt perspective:

  1. Because your read and write models closely follow the particulars of the database you chose, your code is effectively married to it - in the event that your needs change or your database fails to grow with you - then you are screwed.
  2. Integration testing is the only realistic option for testing your code, as it’s married to your database, and your business logic isn’t truly independent of your persistence model.
  3. Without a set of shared abstractions that distill down your persistence patterns into some lowest common denominators, you instead have bespoke use-case specific persistence code everywhere - and rewriting all of it everywhere at once is extremely high risk and expensive.

Programming directly against a database is a high-risk, low-reward bet: if everything goes right, you have one less layer of code to understand and one less common abstraction for developers to bicker about. All you have to do is sacrifice testability, a type-safe way of enforcing standardized approaches to reads / writes, and pray that nothing goes wrong - because if it does, you have 10s to 100s of bespoke driver invocations embedded directly into your application code that have to be replaced together in large groups, if not all at once.

This is why the wisdom of persistence ignorance is trumpeted by the experienced web programmers of the 1990s: this has been tried before and failed spectacularly. NoSQL and distributed K/V databases doesn’t change that.

Marrying your application code, your read / write models, and really your business domain to a specific OR/M or database implementation is a classic technical debt creator: it’s an assumption that the database will continue to grow with the software in perpetuity. It doesn’t price in the possibility, or in many cases - the inevitability, that this will not hold true.

This is a decision that destroys optionality - the ability to preserve future choices for the software without rewriting it.

Optionality

I used the example of database-driven programming as an example of optionality destruction or “low optionality programming.” It’s an inflexibility built into the system from its inception - and if you never run into an instance where that inflexibility becomes an impediment to implementing a future change in your software, then you don’t have any technical debt. But as was the case with MarkedUp, that inflexibility can become a highly compounded source of technical debt that demanded a high price in time, stress, and dollars to be repaid.

This is the context of optionality and its role in reducing technical debt: it is the upfront decision to “price in” future changes to the software and design the system to be able to support them. The nature of those future changes is not definitively known or agreed upon at the conception of the system but it is accepted that their arrival is highly probable.

Definition

“Optionality” is a term I most often come across in finance, i.e. stock options, but I’m going to attempt to explain it on my own terms. Optionality simply means “to have options,” but in order to have an option you must:

  1. Pay a premium - i.e. a reservation fee, a deposit, or any other kind of upfront cost;
  2. An exercise cost - a fixed cost to exercise the option, agreed upon at the time it’s created;
  3. A right to exercise - a set of agreements on when and how you can exercise your option, established at time you paid the premium; and
  4. An expiration date - options don’t last forever; an invitation to speak at a conference only lasts so long as the request for papers is still open.

If you receive stock options in the company you work for as part of your annual compensation: your “premium” is your on-the-job performance; your exercise cost is literally the exercise or strike price of the stock; your right to exercise is subject to the vesting schedule; and your expiration date is the length of your exercise window, usually 10 years or so.

The expiration date is the key - the further in the future the expiration date, the more valuable the option is, and usually - the higher the premium.

Optionality and Technical Debt

In programming, the premium we pay is the set of upfront development costs at the onset of a new project. If you want to get a rapid prototype into production quickly, as we did at MarkedUp, you’re going to pay a lower premium at the cost of having more expensive future choices - technical debt.

Technical debt is the price you might pay later; optionality is the price you will pay now in order to reduce possible future cost.

That’s the bet - and you are always making it every day on the job, knowingly or not.

If you’ve been in the industry a number of years you have likely learned how to make rapid prototypes - often a necessity on the job. That’s a quick skill to learn because of its immediacy. Learning how to do the opposite, to plan for anticipated but not fully qualified future changes, is a less easy-to-practice skill because it requires planning out the evolution of a software system over many years and sticking around long enough to see which bets paid off and which ones did not.

Let’s revisit optionality in terms of software:

  1. Pay a premium - this is your upfront development cost to complete a feature, build a V1, etc…
  2. An exercise cost - a planned route you can take to implement one of several possible types of changes in your system; you’ve thought through, ahead of time, how these types of changes can be introduced to the system gradually and have bounded their costs;
  3. A right to exercise - you can exercise at any time prior to expiration;
  4. An expiration date - here’s the great part: technology options are good for as long as the original component is relevant to the software and the business supporting it.

Software options have a long, long expiration date - this is the power center of high optionality software.

Patterns for Creating High Optionality Software

I am going to have to expand on this in a part 2, where I will detail the following recipes for building more optionality into your software:

  1. Prefer event-driven programming over remote procedure calls;
  2. Persistence ignorance is bliss, but event-sourcing is better;
  3. Command Query Responsibility Segregation;
  4. Apply functions over data - decouple stateful domain objects from business rules;
  5. Use actors to make systems dynamic, queryable, and recoverable; and
  6. Embrace extend-only design on schemas of any kind.

Many of you reading this may be predisposed, one way or another, to some of these ideas already. My goal in writing this is not to convince you that you that your current way of writing software is wrong or even that it can be improved. My goal and Petabridge’s is to expand the power of software developers.

My goal is to share a lexicon and tools to prepare for the long-term evolution of our software systems - to make our tradeoffs planned and intentional rather than done out of habit. And most importantly - to explore why paying a small premium today might create tremendous dividends for you and your team members tomorrow.

I plan to have our second installment of this post done soon. Subscribe for the next post!

If you liked this post, you can share it with your followers or follow us on Twitter!
Written by Aaron Stannard on September 15, 2021

 

 

Upcoming Petabridge Live Akka.NET Webinar Trainings

Get up to speed on the leading edge of large-scale .NET development with the Petabridge team. Each training is done remotely via webinar, lasts four hours, and will save you weeks of trial and error.

Course Dates
Akka.NET Application Architecture and Design Patterns
Building Networked .NET Applications with Akka.Remote
.NET Distributed Systems Architecture and Design with Akka.Cluster

Get to the cutting edge with Akka.NET

Learn production best practices, operations and deployment approaches for using Akka.NET.