The first lines of CircleCI’s codebase were written nearly nine years ago. Looking back on what we have built so far with the clarity of hindsight exposes some interesting themes that can be instructive in identifying approaches to dealing with change. While they weren’t all intentional, it doesn’t make them any less valuable. Three such themes are: deferring the need to handle change, thinking like a product manager, and keeping your head up.
Change is a constant in technology
Change might be one of the only constants engineers will deal with over the course of their careers. It certainly drives many of our most interesting challenges. The market is changing, our business is evolving, our customer base is growing, and our team is scaling. We need to build a solution that meets today’s needs but sets us up for the demands of tomorrow. How do we design our systems in a manner that can adapt and change to things that don’t even exist when we start building it?
There has been a lot of talk in recent years about architectures that are specifically designed to evolve or more easily adapt to change. Then we start to debate the merits of these architectures using a line of thinking that suggests you are choosing to have a “Microservices Architecture” or an “Event-Driven Architecture” or a “Serverless Architecture.” I would posit that this type of description creates a false sense of finality that doesn’t actually exist in most real-world systems.
Outside of the youngest of projects, it’s unlikely that you can describe the architecture that you have with any simple one-liner. It’s more likely just “your architecture,” and if you’ve been working on it for any significant amount of time, you may have applied multiple different patterns along the way to solve specific problems. There’s probably a monolith somewhere, some number of microservices, a few events, and a serverless element or two. On top of all that, you might be in the middle of transitioning some of those pieces from one pattern to another.
Design architecture to solve problems
I’m often asked by folks which architecture they should choose, or when they should switch from one to another. I have a great tendency to disappoint with answers like “what problem are you trying to solve?”
So, what should you be thinking about as your product evolves and your needs are changing?
There are a lot of approaches that we can use to design software and systems to be more resilient to change. The allure of not having to do complex refactoring or rebuilding down the road leads us to want to design systems that have a low cost of change. On the other hand, while generalized abstractions and reusable components are at the core of reducing the impact of change, they are hard to get right in the first place.
It’s hard to predict the future
At the same time that the technical landscape is changing, your business and your organization are both evolving as well. These both have a significant impact on your architectural decisions. The specific types of change that you are undergoing will influence the software architecture best suited to absorb those changes.
In the early days of a project or company, it’s possible to make sweeping fundamental shifts in the objective. Extreme, but not uncommon, examples include Tiny Speck becoming Slack and Odeo becoming Twitter. When the domain of your business can move that drastically, the clean boundaries of your DDD model are out the window. In a set of microservices, these boundaries will be codified, so if you have to cut through them, you’ll likely be thankful for a more pliable monolith.
Once you find product / market fit, your priorities will shift towards supporting your rapid growth. With system scale as a driver of change, the ability to respond to change is often tied to the independence of systems based on their operational characteristics.
After that scaling phase, you’ll have a larger organization and far more teams. If you’re lucky you’ll be back to sustainable product evolution. Then the cost of change will be proportional to the amount of cross-team coordination required to make a change.
Defer response to change
When an individual or team has a problem they’re trying to solve, they generally start off with a pretty simple directive – let’s build to solve that problem. What’s the most effective architecture we can use to build a viable product at the lowest cost? This is generally true whether it’s a startup or a huge multinational corporation. A new project will be built on a system that’s relatively new, and will be unlikely to have product-market fit because the company won’t fully understand its business domain yet, and won’t know where the primary sources of change will come from.
In this situation, designing for change becomes a lot harder. If it’s a lot more expensive to make things resilient to change and you can’t tell yet which parts of your system will change, how do you make the decision about where to make that investment?
As counterintuitive as it may sound, the answer is probably nowhere. At least not yet. Instead watch change closely and build in a way that minimizes the cost of being wrong.
“The presence of two options is an indicator that you need to consider uncertainty in the design. Use the uncertainty as a driver to determine where you can defer commitment to details and where you can partition and abstract to reduce the significance of design decisions. If you hardwire the first thing that comes to mind, you’re more likely to be stuck with it – incidental decisions become significant and the softness of the software hardens.”
This framing is great for explicit decisions, but what if you don’t know you’re making a choice?
Many times, the choice isn’t even visible yet, but will reveal itself later. In these situations, the answer is not to over generalize, building abstractions everywhere just in case. Instead, keep things as simple as possible so you can understand them later if you have to make a change.
Simplicity makes it easier to adapt to change
In the early days, the CircleCI application was a monolith that took a customer’s build with its associated data and pushed it into one of several LXC containers. Every container that we spun up was instantiated from the same image that contained everything that we thought anyone would ever want in their test environment. In hindsight, this sounds like a terrible idea, but at the time, it was fantastic. It was simple to maintain and supported the needs of our early customers, many of whom were building Rails monoliths.
As time passed and our customer base grew, so did the diversity of their needs in terms of test environments. Upgraded versions of their underlying databases, novel new development frameworks, even new operating system versions became necessary.
The original container management in CircleCI wasn’t designed in a way that allowed us to adapt easily to these changing needs. But it was fairly simple. So when we set out to solve these problems, we knew where to splice in a new approach and it was minimal work to enable that splicing. We also didn’t have to unravel a poor generalization that didn’t support our new problem.
It’s important to note that while seemingly simple, we were five years into successful growth as a company meeting the needs of our customers on that simple system before our first customers tested its replacement. In those five years, Docker was created, as was HashiCorp’s Nomad. Combined, those tools eliminated huge portions of the work necessary to get the flexible and scalable environments that we support for customers today.
Also, as we retooled the system to adapt to the changed market we were in a position to ask “How do we do this in a way that better positions us for incremental change?” It would be difficult to overstate how much value that five years of experience provided when designing a solution.
It sounds weird and grossly unfair to say this, but most of the time we don’t even know that we’re making a monumental design decision because the alternate path hasn’t shown up yet. How do you guard against this?
Defer, defer, defer
The correct solutions have a habit of revealing themselves if you can find a way to wait long enough. Technology gains traction or dies. If you chose a container orchestration engine in 2016, there’s a roughly 20% chance that you would have chosen Kubernetes. That means by the beginning of 2018, almost 80% of companies were switching. Those are not great odds.
So deferring can be good. Deferring to the point of creating a crisis is not.
Think like a product manager
Your architecture is a leaky abstraction. While CircleCI is probably an extreme example of this case due to the access our customers have to systems in our platform, there is always some implication for customers of the decisions you make in defining your architecture. Recognizing this impact and being able to coherently discuss approaches and alternatives is a huge asset that an engineer can bring to their PM and team.
While we were witnessing all of this change in how our customers were building software, one thing we ignored for too long was the rise of Docker. Docker didn’t even exist when CircleCI started in 2011, then by 2014 it was everywhere. Many of our customers had started building Docker images as part of their build to prepare for deploy.
Docker was originally built on top of LXC, and that meant in those early days we could support the use of Docker commands to build images and push them to repositories all inside of one of our LXC containers. In 2014, Docker launched something called libcontainer and they pulled apart the access to the underlying system and created execution drivers, which soon led to the deprecation of the LXC driver. Disappointing, but it still worked. Then Docker deleted all LXC support. That was bad.
Always think about users
As engineers and architects, we saw this coming, and frankly, it was under the radar of our product team. Our non-technical product folks didn’t necessarily understand the implications but our engineers were talking about it every day. So as an engineer, an architect, or a leader in technology, you need to be thinking about the direction of the product and working with your product managers to ensure everyone understands the implications of technical choices.
It’s important to separate this discussion from technical investments. There is absolutely a place for this type of investment, and, an argument that framing these investments in the same way that you frame product investments makes it much easier to make tradeoff decisions. However, identifying the directly customer-visible feature impact of architectural choices is different from the more commonly discussed impacts of cost, performance, security, etc.
Building a better understanding of the relationship between your architecture and the value achieved by your customers will put you in a position to make more informed decisions about what and how to build as your business evolves. Then you can focus on getting ahead of that evolution.
Keep your head up
I’m Canadian and, unsurprisingly, I grew up playing ice hockey. Throughout my childhood, I spent an endless amount of time at practice just skating: forwards, backwards, turning, stopping, figure eights around cones. Then came stick handling. The idea was to have the movement on the ice become second nature so I could focus on what really mattered: seeing the play develop, getting into space, and trying to score. Make the fundamentals almost effortless so you can put your energy into the novel stuff.
In software, we have a bad tendency to make the fundamentals a lot harder than they should be by adopting new technology that we think is going to be game changing, but ends up having modest upside if any. To make things worse, the downside is often unbounded as we get our teams up the learning curve, find untested edge cases during production incidents, and invent our own “best practices.” A world where Stack Overflow has no answers.
While we’re putting this effort into our novel tech, we’re not focusing on how our systems need to evolve to meet the needs of our customers.
A time simplicity would have helped
CircleCI is a Clojure shop and, in our early days, we decided we’d be better at front end development if we used the same language that we used in the backend. So we adopted Om, a ClojureScript wrapper around React. A few years later, David Nolen, the person who started Om, decided he didn’t like the model so he replaced it with something incompatible called Om Next. We tried to migrate incrementally to Om Next and ended up with an overly complex frontend with two state models and massive overhead to every change.
We ended up rewriting it in React. In retrospect, Om had both high risk and high cost to change and we paid the price. And we spent that time looking at our feet, not ahead.
No company has ever won because of an amazingly novel or esoteric technology choice. There is an endless list, however, of companies that have won because they can move quickly and with agility, adapting to change as it happens. Technology should be an accelerator helping us meet the needs of our customers. Well-understood, production-tested tools are far more likely to fit that bill. As my colleague Bear says, “If a tool isn’t helping, it’s not a tool, it’s a chore. Drop it.”
Unforeseen change is top-of-mind for so many of us right now, and it’s more obvious than ever that, in business, nobody can predict the future. It’s a safe bet that folks at GM didn’t start the year thinking about how to use their factories to make ventilators.
Being prepared to adapt in the face of change requires thinking about change as a driver in everything you do. Watch how your market is changing and reflect on how that impacts your architecture so that you can make targeted, incremental improvements in its adaptability. This will help keep you focused on evolution without wasting time or money on areas that won’t need it.
Black Swan events are so unusual that studying the specifics can be low value, but they remind us how far reality can be from our plans.