History and future of DevOps
Apr 30th, 2021 | 51 minutes
Jim is the CEO of CircleCI, a continuous integration and delivery platform used by the world’s best engineering teams. Jim joined CircleCI through the acquisition of Distiller, an iOS-only continuous integration service. He was Distiller’s co-founder and CEO.
Rob Zuber is a 20-year veteran of software startups, a four-time founder, and three-time CTO. Since joining CircleCI, Rob has seen the company through its Series F funding and delivered on product innovation at scale while leading a team of 250+ engineers who are distributed around the globe.
Rob interviews Andrea Goulet, co-founder of Corgibytes. They discuss: Is there a change afoot in software that makes empathy more relevant now than ever before? Can empathy be learned? What long term effects can be seen from developers building empathy into their practice?
Rob interviews Matthew Skelton, co-author of Team Topologies and founder of Conflux, on how to structure your team for a fast flow of change. Discover the signs, symptoms, and metrics that indicate your organization's structure may need a redesign.
Rob: Hello, and welcome to The Confident Commit, a podcast for anyone who wants to join the conversation on how to deliver software better and faster. You’re listening to our very first episode, episode one. Today, we’re talking about the history and future of DevOps.
I’m your host, Rob Zuber CTO at CircleCI, the industry leader for all things CICD. Today, we’re going to talk with someone who probably knows more about the history of CI and thinks more about the future of DevOps as a whole than anyone on the planet, my friend and CircleCI CEO, Jim Rose. Jim, thanks for joining me today.
Jim: Sure. I’m not sure that I think more about the history of CI or know more about the history of CI, but I’m happy to share.
Rob: Maybe you talk more about the history of CI than anybody else in the planet. What about that?
Jim: That’s probably true. I probably do need to talk about it more than most folks, so.
Rob: Right on. Well, we’re going to jump right into it, but for anyone who’s listening, paying attention, doesn’t know, Jim and I have worked together for a very, very long time. So this is a fun way to kick this off. We worked together just over 10 years and joined CirlceCI together through the acquisition of a company we had built called Distiller.
And so, we were thinking about CI and CD at that point. And interestingly going all the way back, Jim was the one who introduced me to CD. And we were building a marketplace, a consumer marketplace at the time and trying to figure out how to deliver effectively and get things in front of customers.
We were three people, Jim was responsible for product and I was writing code. And so to me, that’s a really interesting concept, Jim, because I think when people think about who’s going to be the one to bring CD to the table, it’s probably not necessarily from the product side. So as somebody who spent so much of your career thinking about product, what was it that really got you thinking about that?
Why was that interesting to you from a product perspective? And what was sort of your aha moment that got you into thinking about CI CD and how DevOps would help from an overall business perspective?
Jim: Yeah. Well, I mean, I think from a product perspective, the thing that you’re always trying to figure out is fast feedback. Whatever you’re building, you don’t know if it’s going to work. You’re going to be right maybe a third of the time, half the time, if you’re pretty good. You just don’t know which half the time that you’re going to be right.
So you want to be able to build something thing at the hands of the user so that you can get some feedback so you can figure out how to tweak it and tune it and iterate and improve it over time, or figure out if it’s just not going to work. If you were just totally off track kill it so you stop investing in it.
So when we were doing copious back in the day, I, for whatever reason, found myself reading a lot of the Etsy Engineering Blog, thinking about continuous delivery and thinking, that seems like a good idea. I mean, if we can build something, we know it’s going to work, we should push it as fast as humanly possible into the hands of the user, because it will just help us make better, more informed and faster product decisions.
And so that’s what started it all was basically looking at that, was it called Code As Craft blog, that’s Etsy to publish. And just thinking about some of the concepts that they were focused on. And I think from a product perspective, that’s what you always want. I mean, the part that’s always frustrating on the product side is you just don’t know if the thing that you’re recommending is going to work.
I mean, all the data points might be completely aligned and you might think that you’re doing everything correctly, but then you put it out in the market and it lands with a thud. And then you spend all of your time trying to figure out why. So, the faster you can get it out there and the more the lower the investment amount is for that initial data point, the better off you are.
So, everything around agile and MVP and all of those various patterns, I think fit into the notion and direction of continuous delivery as a whole.
Rob: I mean, that all makes a ton of sense. I think it’s a really important point for both engineers and product folks to get to in terms of understanding how to learn faster. For folks listening, for us, that was about 2011. I think Etsy was talking about that just around that timeframe, but take us back a little bit further, where did all this start?
Who started thinking about CI and CD and started to really push one of the things that I’ve been thinking about lately is how long it takes from when these original ideas are seated to the point where there’s even early adopters let alone mainstream? So, where did it all begin?
Jim: Well, I mean, I think that the term CI was coined by Martin Fowler. And I think that was 2002 back when they were doing projects for ThoughtWorks back in the day and trying to help teams at least automate the testing steps.
So instead of having engineering teams that are taking a big waterfall spec and then running off into their own worlds and building in private and then trying to do the mega merge at the end, which of course never worked. Trying to create a process and ultimately an underlying technology that would allow for at the time, I think it was just daily merges.
It was go off, build whatever you’re building, but at the end of the day, make sure you merge your changes into the main line and then we’ll figure out if it broke or not. So that was 2002. And then I think they built that into cruise control, which was sort of open source first attempt at CI, mostly coming out of ThoughtWorks.
And then Jenkins started out as Hudson, which was the technology that Sun Microsystems used to try and build out the Java programming language, and they had all these remote engineers everywhere. And they needed a way to be able to get people to integrate back into the main line to figure out if it was going to work.
And so they built Hudson and then fast forward a few years, Sun gets bought by Oracle. Oracle decides Hudson doesn’t seem like something that they want to support. They let the main contributors break off, create a fork which becomes Jenkins. And then Jenkins is off on its own outside of Sun and Oracle and kind of doing its thing.
And then fast forward, you’re kind of like at 2009, 2010, 2011. And I think that is when the real kind of Cambrian explosion happens around cloud and around things like CircleCI, right? So CircleCI was started late 2011 before you and I got here.
And CircleCI along with about a dozen or 15 other startups all recognized that the practice of continuous integration and ultimately kind of the flicker of the idea of continuous delivery was a really good idea. And the folks that were able to do it had big bespoke, custom built tool chains at the Googles and the Netflixes and the Amazons and others.
And everyone approached it thinking, how can I take that concept and all of this great tooling and boil it down into something that kind of very small but technically sophisticated startups could adopt? And bake it directly into the cake and bake into their process right when they got started. And so that’s why you saw all those logos that came out 2011, 2012, 2013.
Everyone basically kind of rode on the wave of the adoption of cloud, lots of startups kind of leaning into that as a move, but eventually that world kind of cleans itself out because there was a lot of logos. There were a lot of folks. There were a lot of companies that probably were less interested in being companies which happens.
They were really interested in the technology itself. And then all of a sudden that whole market sort of consolidated away and now you have a lot of platforms.
Rob: Got it. So as a minor aside, I mean, I know you know this, but just to share our experience building copious, we started just before I think CircleCI was started so earlier in 2011. So we were using Jenkins and just the number of things that we did or had to do to make that work in terms of… As a fairly small organization managing parallelism and flaky bills and just all this stuff and thinking, gosh, there’s got to be a better way.
And then I remember CircleCI coming out and one of our engineers pointing it out and we were like, oh, if only 12 months earlier, nine months that would have saved us just a ton of trouble. So we’ve talked a bit in there about CICD and then cloud coming onto the scene, or really getting adoption and starting to shift things.
What else was going on that’s interesting to you from a DevOps perspective over that sort of 2000 to 2010 timeframe that got us to that position where everything just kind of hit at once in 2011, 2012?
Jim: I mean, I don’t know if it’s specifically DevOps. I just think it was the move into agile development. I think everybody realized, I mean, dating myself going all the way back to the late 90s when we used to build web based systems, it was still very much a waterfall based approach.
It was write a big spec, hire a big team, start writing custom code, oh, rack all the boxes yourself and just push these globs onto the system every three weeks. And that was how you did it, right? And I think everyone realized that was a really bad idea. I think the.com implosion made a lot of folks kind of contemplate how we were building software and how we were testing concepts.
And so I think that’s when you get the big move into just agile software development and agile planning. And as you start to break that concept down and you start to think about, well, how do I build faster? How do I lower the risk of the build? Well, then you get things like CI that happened as the kind of underlying technical process and platform necessary to kind of take that all the way through the engineering process.
And I think as you get into 2009, 2010, 2011 is when you really see the vast majority of software being cloud delivered, right? You move from let’s burn it on a CD gold master disc, and then we’re going to package it and shrink wrap it and put it on a shelf to, well, we’re going to release it.
And oh, by the way, we may update it. And at the time, if you were updating every week or two, you were moving at a tremendous pace, but for the most advanced shops out there, they were releasing dozens, if not hundreds of times a day. Even back then, if you think about the Facebooks and the Amazons and the others.
And so that shift into SAS based software delivery or just software as a service in and of itself and no longer being a physical object that had to be shipped, I think really changed the way that you think about what you’re giving the customer. The customer is buying into an ongoing relationship with you as a vendor.
They’re not buying something off the shelf and then buying a maintenance package to get patches that come down every quarter. They’re investing in a relationship between the software vendor and you as the customer. And it’s something that you’re reassessing probably every day or every week or every month.
And that relationship or that idea of everything being software based, I think is permeating everything now. I think that’s the big transition that’s happened over the last 10 years is that it was really easy in a world of bits and software. I mean, all of us were already thinking about agile development.
We were thinking about how we could lower the risk and lower the cost and the opportunity cost of development. But now you look at it and you’re like, well, that’s true about banking. It’s true about FinTech. It’s true about automobiles and logistics. It’s true about transportation, you name it.
It’s that same agile sort of continuous improvement approach that’s now hitting basically every industry and people are trying to figure out how to tool for that and realizing like, wait a minute, if I have a screen between myself and the customer, I have software somewhere in that mix.
And I should figure out how to be really good at delivering software, because that’s going to be the important glue and the important part of the relationship between myself and my end user. And I think the thing about COVID is that COVID basically made that the importance of software so apparent and obvious to folks that I think probably had under-invested in the past.
If you think about it, if you were a bank going into last March and you didn’t have a really well-oiled software delivery machine with a really, really good app, and just all of the services kind of tied together, you probably fell behind over the last 12 to 14 months. And everyone is racing to try and catch up.
And the folks that already had that infrastructure and had that machinery in place are basically… It’s like they’re pulling away from the pack and pulling away from the Peloton. And I think software is becoming more and more a marker of winners in various industries.
And everyone else’s either a winner or you’re a loser, or you’ve got to figure out a way of kind of fixing it and improving your process to try and catch up.
Rob: Yeah. I could feel the product side of Jim coming out in there. I mean, ultimately I totally agree with you. I’m just saying that perspective is really in line with exactly what you said at the beginning. How do I get this stuff in front of my customers as quickly as possible? How do I make sure that I’m on the right track?
Because everyone is moving fast now. Everyone understands how to deliver software. So you have to be at the top tier of the top tier, not just like, yeah, we do CICD so we’re going to be fine. And really having that as the competitive surface.
I won’t point out any names or call anyone out, but I will say that over the last year, at least now I can properly take a photo of a check on my banking app because I swear I had the worst banking provider on the planet. They provide some other things that are important to me, so I’ve sort of stuck with it, but I think they stepped up their game starting to feel that pressure that they made a lot of improvements in the 12 months.
So, let’s go a little bit of a different direction over the course of all that. We talked a lot about tools and process. Do you think there was a significant cultural shift over all that time in terms of how people… I mean, you talked a lot about accepting the risk of putting things in front of customers early and just wanting to get that feedback really quickly.
And I sort of mentioned that that’s a shift for people. What else did you see over that time in terms of mind frame or mindset and the way that people had to change the way they thought about software delivery?
Jim: I mean, I think on the product side, the idea of being able to put something out that isn’t quite done, but what you’re really embarking on is a journey of learning, I think, is a hard transition for most product teams. I mean, even internally with us, right?
I mean, a lot of what I do because I currently run a product for the time being is just giving people the confidence that you’re going to put out a version, it’s not going to be complete, but just be really clear about what you want to learn. And assuming you get those data points, we can direct what’s next along that journey for that particular feature or pull it, right?
I mean, it’s possible that it just doesn’t work. But I think for a lot of folks that doesn’t come naturally, right? I mean, you want to put something out that’s a great professional reflection of your work and of your craft. At times, you’re going to put something out that is probably less than the refined and less than finished.
And you can feel like that may be difficult, right? And so I think for product managers and product teams and designers overall that can be hard. And then as well in engineering, I think that can also be difficult. And it’s especially acute in our space, when you have software engineers building tools for other software engineers, their colleagues, their friends, and that really does become something that’s very precious.
And it’s something that you feel very strongly you want to make sure that you get right, but you have to be willing to let go. But you can’t just let go and just ship bad product. What you want to be able to do is let go and basically say the goal isn’t to be done, the goal is to be right.
And how can I learn along the way so that I can make sure that I’m building a better and better experience for that end user over time. And it doesn’t all have to happen at once. I think the transition that’s happening from a software delivery perspective is it’s the same effort of letting go.
And by that, I mean, so I think what we see with customers is that the journey that customers go on when they get started is we are helping them. When they first come on to CircleCI, we’re helping them automate their existing process. I have a big monolithic app. I’ve got a bunch of manual process or home-built scripts or kind open-source or kind of glued together platforms in place.
They’re all breaking. I just need to come up with a way of automating what I already do, but I’m monolithic and thinking I have one app and it’s all packed together and it’s explicit. I understand what the architecture is and what I’m building and how I’m shipping it. And what we help them do is accelerate what is already an existing process.
And maybe you take that and you do it in the past. You did it once every two weeks or once every week. And now all of a sudden you’re doing it dozens if not hundreds of times a day. You’re just able to, every time there’s a change, you’re able to integrate that change into an existing process or an existing system. And that’s awesome.
I mean, it’s a tremendous uplift from a productivity perspective for those teams just to be able to move at an increased pace. But I think what we find is that as people are accelerating and as they’re adopting the just cloud-based infrastructure, you have to kind of go back and fundamentally re-architect the what you’re actually building and how you’re shipping it to the end user.
And I think that that happens in two main respects. I think the first thing is that, so as you’re building out a monolith it gets bigger and bigger and bigger, and you’re going faster and faster and you’re injecting more and more changes, but the monolith does naturally go slower and slower because it’s just a bigger thing that you have to build and test.
It’s like if Moby-Dick were three pages, it’s super easy to run spell check on it. But if Moby-Dick 500 pages and you’re only changing page 38, do I really need to worry about page 487? Probably not. And so you take what is this big monolithic application and you break it down into its component pieces so that the smaller teams can run faster and maintain velocity.
And that’s awesome. And that’s microservices, right? But on the flip side, you have the infrastructure engineers who are trying to make those same choices and basically saying, well, if I don’t want to run, take, for example, I don’t want to run Mongo. We use Mongo as a data store. I don’t want to run it. Sure I could run it.
I can download the opensource version, put it on a bunch of AWS instances, but there’s this company called Mongo over here that actually knows how to run Mongo. So why don’t I use their Atlas based service? So then all of a sudden I have to start teasing apart the infrastructure layers and pulling those pieces apart and making those available through services.
I’m basically the way I think about it, it’s like a monolith is like a pillow. And then all of a sudden you’re pulling all the stuffing out of the pillow to try and make it better and to try and make it faster. I think the operators are the first ones that have had to have kind of a reckoning with what that means, right?
From an operator perspective in a monolithic world, it’s a world of gates. It’s a world of everyone needs to adhere to my standards. We need to run a certain way. And if you don’t run this way, you can’t come in. And if you’re only releasing once every three weeks or once every three months, operations can kind of get away with that.
You can kind of say, listen, you play by these rules or you just can’t get on the field. You can do that, but when the market is moving faster and faster and faster and faster operators have to let go. They have to basically say I can’t control everything, but what I want to be able to do is establish the rules of play and say, you have to have systems that are observable.
You have to know that they meet certain requirements. You have to be responsible for running what you build. In here are the list of all the responsibilities that you have to put something into production.
And I think that’s why you see this big move from monitoring and this idea of I can control and know everything to a world of observability, which is I have no idea what actually is in production, but I can observe the system and figure out when it starts to run out of bounds.
So you go into this world of having really kind of tight hooks in and trying to control into this world of like, we’re going to watch it. And if it starts to run in a direction that we don’t believe to be sustainable or healthy, we’ll roll it back and we’ll get back into a good state and then we’ll figure it out.
And so I think operators are pretty far along in that journey, or at least some operators and some shops, the folks that are furthest along, I think the operators have been able to let go. I think what’s happening on the software development side is that the developers have to start letting go. So developers in a world of a monolith, their architecture is explicit.
It’s written down, it’s in code. It’s like, this is the thing I’m building. This is the thing I’m shipping. I understand it, and can reason about what’s in the box. But the minute you start to tease it apart and you tease it into all of these service meshes and all of the services that are dependent on other services around them and they’re all moving independently, all of a sudden, there’s no one inside of the company that can sit down with a whiteboard and basically whiteboard out your architecture because no one has any idea.
They’re all moving independently. And so from a software engineering perspective, I think we’re in this period of reckoning with that. The first thing that I think everyone is reckoning with is this idea that everything has to go through the get repository, like version control everything and do everything and get ups.
Everything is version controlled. And it’s like, well, if I knew what everything was, I could just have it in a monolith, right? If I actually understood what everything is, I could just write it all down. I could test everything, but the reality is, I don’t know what I don’t know.
And so the idea that I can write it down and be predictable and deterministic and have it sitting inside of the system, I think is anachronistic. I don’t think that that’s the world we live in anymore.
And so developers, I think, are trying to figure out or being forced to reckon with the idea that you have to start letting go and be comfortable with the idea that not everything is going to be sitting in code, but I’m going to be responsible for not just my code base that’s running, but all of its dependencies around it and the application itself.
It’s no longer just code, it’s the whole experience that I have to ship. And I think that’s when we think about the customers that are on the furthest end of the journey, that’s the thing that they’re wrestling with, is just like, it’s not a code base anymore.
It’s this amorphous application service thing that breaks in all these weird ways that I don’t always 100% understand, but I’m 100% accountable for. So how do I put up guardrails and controls? Not so that I can slow it down or stop it, but more so that I can shape it and direct it.
And I think that is the wrestling match that’s going to happen over the next several years. And I think it’s only going to get more acute as more and more services become more ML and AI based, where it truly is like all I’m doing is defining a model and a set of rules. And the software itself is figuring out what it should be doing and training itself.
Then I really am from a software engineering perspective letting go and saying, I can establish the ground rules and the boundaries, but the software itself is going to figure out how it optimizes it’s activity. But again, I think that’s much further out, but I think that that’s the reckoning that we have to go through from a software perspective.
Rob: I know this is one of your favorite expressions and it works really well for me because I’m Canadian, which is you’ve got to skate to where the puck is going, not to where it is. And a lot of what you’re talking about to me is this is happening. This is coming because we’ve already made decisions that have set these wheels in motion.
And now we need to get ahead of it and put in place practices that are going to make it viable and safe, give us confidence to operate in that world as opposed to trying to resist it. It’s like trying to resist a river, it’s going to carve its path. To your point of guiding, you can put a few stones here and a few stones there and try to shape it a little bit.
But yeah, this is happening and we need to sort of get ourselves in front of it and think about what the next step is so that we can say, okay, we’re actually ready for that. As opposed to, oh, we’re playing this constant game of catch-up of wait, the developers just did what? They just deployed what?
Now I need to figure out how to operate that environment. What tools can I get? And doing everything sort of as after the fact scramble. So is there anything else? You mentioned at some point just the forcing functions of, let’s just call it 2020, right? Everybody being distributed, to everybody using that digital interface to connect into the system.
Is there anything else that came out of last year that really is starting to shape how you think about where we’re headed in terms of getting in front of it, like trends that maybe weren’t as obvious until we were all in the pressure cooker?
Jim: Yeah. I mean, I would say COVID adjusting is the idea that I think everyone woke up to the fact that software is a supply chain now, and it’s a supply chain problem. So we’ve been living in this illusion that software is a single player game.
That most software is still custom made and that the developers are taking the requirements and basically handcrafting and building a perfectly encapsulated application that they can own and control and understand on each and every line of code. And I think that that world is gone, right? If you think about applications today, there are these past teaches.
There are these amalgamations of opensource software, third-party services that you’re using custom code to kind of glue together into a representation of a service that you want to support and that you want to present to the customer, but you’re writing less and less of it. And you’re assembling more and more of the pieces.
In a world of assembly, the benefit of assembly is that I don’t have to reinvent the wheel each and every time, which is great. That’s why I think you’ve seen so much acceleration from a delivery perspective going from zero to one. Teams are just able to go to zero to one so much faster because it’s like, I don’t have to figure out how to build pagination.
I don’t have to figure out how to build winnowing filters on the catalog because I just go pull down a bunch of libraries, put them together and it works. Or I don’t build payment anymore. I externalize all of that out to Stripe or Agile or somebody else. And I don’t even have to build any of those pieces anymore.
I’m just communicating to a fully externalized service. So that’s awesome. The downside of it is that I don’t actually control most of the inputs into my application anymore, which is problematic if you allow it to be. In a world where you don’t own the inputs, what really becomes important is not the custom component or the single-player mode component.
It’s a supply chain. It’s can I monitor all the pieces that are being glued together and are coming inbound into my application in a way that both based on my experience, which is important if I try and integrate one of those libraries or one of those services and it breaks because I messed it up.
Or I don’t do something correctly from a software engineering perspective or from a product perspective, being able to identify it, fix it and get that process into a green state is awesome. But all of those inputs and all of those services, they’re changing all the time too. And changing in ways that they as service providers or as open-source maintainers believe is right.
They are doing the right thing. They believe they’re doing the right thing, but it’s a complex world out there and it breaks all the time. And when it breaks, it breaks for everyone. It’s no longer just a single threaded process, it’s a web. And when one of those nodes goes bad, or one of those changes goes bad, the whole web shutters, the whole thing shakes.
And when it shakes, people start braking. And so, I think from our perspective, what we’ve always looked at that and said, you know what? If it’s a supply chain problem, it’s a network problem. And the more we understand about the network, the more information and intelligence that we can give users of our network to make better choices.
And better choices might just be, we saw this dependency that you’re currently using. By the way, it broke the last 10 times that people tried to build using the latest version. Chances are your next build is going to break because well, the last 10 did. And so if it does, look here first. I think that’s where we’re trying to get to right now.
But I think longer term where I think the market goes is that that’s going to become less reactive and it’s going to become more proactive. Where people are going to make… Things are going to go wrong in the supply chain. Things are going to go sideways. There’s going to be internet weather. And then we’re going to learn what it’s causing that internet weather.
And then we’re going to be able to proactively go out to projects and basically say, we prevented you from doing X because if you did it, you were going to break and you were going to end up in a bad state. And so we’re going to help monitor this on your behalf. So I think that’s where software is going.
I think all of these supply chain attacks, whether it’s the dependency confusion attack that got launched a few weeks ago, whether it was what happened with solar winds. There was just something that happened, I think earlier this week or last week where one of the big languages environment’s got left padded again when somebody pulled a dependency, everything snapped.
I think people are waking up to the fact like, oh boy, all of this stuff is deeply, deeply interwoven. And how can I monitor the things that I don’t necessarily write or control in a way to make sure that my build gets more efficient and better over time?
And so I think that’s going to be what COVID has brought to bear is it’s made that problem very, very obvious. What the solution is, I think, for all the various vendors across the board is the journey that everyone’s going to be on for the next three, five years.
But a lot of what I did even before we worked together, right, is I did a lot of work in the consumer space and did a lot of work in ad tech. And if you think back to 20 years ago in ad tech or 25 years ago, at this point dear Lord, if you think back that far is when I wanted to traffic online ads, I used to have this really, really complex stack of tools.
I would go off and buy my own inventory on a collection of websites. I build my own creative according to hopefully some set of standards with my own graphic designers. I’d have my own trafficking server, I’d have my own analytics server. I would have all the monitoring to make sure that I was actually getting what I bought from the from the sites that I was working with.
And then I would have all of these underlying tools we string together to figure out if I spend $10 in online ads, what did I get from it? And then ad-words happened. And Google was like, well, we already know what the user’s looking for. We know the creative is that works, that connects with that user.
What you need to tell us is how many users you want and how much you’re willing to pay. And then we will do all the work to figure out how to balance that in the system. And you don’t necessarily have to worry about the underlying tooling.
In fact, we’ll give you the tooling for free if you want to look at it and do all the analytics and everything else, because we already have the spend. And I think that’s what’s going to happen in this world, is that the intelligence is going to outweigh the tooling itself, but the customers are going to want access to the tooling because they’re engineers, right?
They’re going to want to be able to get under the covers and play with it. But I do think that the intelligence is the hard part. But one of the things that you mentioned before is that for operators, this world of going into cloud has been really tricky. And certainly this in COVID, it’s been a complicated transition for a lot of folks.
It’s laid bare a lot of manual process, a lot of custom built stacks where you’re like, that’s not going to scale. And it’s forced people to figure out how to automate a lot of those pieces. But I think that the flip or the inverse is true as well, which is, there’s been a lot of development teams out there that have been saddled with and have taken on the responsibility of running what they build, but are realizing they have no idea how to run it.
And they don’t have any of the tools. They don’t have any of the monitoring. They don’t have any of the intelligence necessary to make good decisions as it’s making its way into production. And so everybody’s always talking about it and I never can figure out from a Zoom perspective, whether I’m going the right way.
Everybody’s talking about shift left and putting more and more responsibility into the hands of the engineers. But the engineers, I think increasingly, the product engineering teams and application engineering teams are trying to figure out how to shift right. That if you’re going to make me responsible for this, I need data.
I need tools. I need to know whether this stuff is working or not. Because if it breaks, I’m the one who’s going to get paged for it and have to fix it. And I don’t want to have to spend two hours actually trying to dig out logs and figure out what actually happened. I need that data on a much more immediate basis. So I think people have lost track of that.
Rob: You referenced this a little bit in your ad tech analogy, which I think is a great analogy. And I love that you comment on dating yourself as some of these, because we’re from the same era, I’ll just say. But that point that AdWords, others enabled you to do all this stuff at a very high level, but people still want access underneath.
And I think that happens to the data to really understand what’s happening. And I think probably every software engineer has been burned at some point by the fact that they were massively accelerated, taking tools off the shelf. And then all of a sudden, one of them broke and they just had no idea what was happening below.
And trying to figure that out while your site is down is not fun for anybody. And so I think that exactly what you’re talking about. Okay, cool. It’s actually easy for me to build a service and put that in production, but I actually need to know what’s happening. I need to know what everything is that supports that, so that when something doesn’t interact nicely, yeah, I’m actually in a position to do something about it, right?
So I think that that is kind of right at the… I love to rant about all the ways that we’ve taken DevOps and used it as a different word or whatever. But if we go back to the heart of cats and dogs playing together nicely, the developers and the operators actually working together to solve a problem, I feel like this is at the heart of that evolution in the sense that we’re continuing to learn to sort of take ownership of different parts and also share models of how we work.
I think infrastructure is code is sort of operators are now SRE teams. It depends on the organization you’re in, taking the part they were responsible for and acting like a developer. And you’re talking about developers owning things in production, carrying pagers and being responsible and saying, wait a second, where’s the monitoring system or observability?
How does this work? So what do you think is next down that path? As we sort of throw ourselves into the deep end from a development perspective and start running these systems in production, what else are we going to need to do that successfully? And how does that ultimately support building great products for customers?
Jim: Yeah, I think from a software development perspective and from a product management perspective, it means you’re going to have to be able to pierce the veil and go all the way into production. You no longer have for the majority of services, this division between what’s happening pre-production versus what’s happening in production.
You’re going to have to be able to see it all the way through. I think the other problem that comes out of these microservice architectures is that what used to be best practice from a testing perspective is, well, if I could make pre-production and production direct mirrors of one another, then if I tested in pre-prod and it works in pre-prod and it works in staging.
I know if I deploy it correctly in the same way every time, it’s actually going to land in production and going to work once it gets there. But in microservice architectures, you just don’t know that anymore because you don’t know what the state of all the services are and where all the dependencies are.
So you’re going to have to figure out how to be able to roll into production slowly, get present, get deployed into production and then start to release and put traffic at that service and figure out if you’re in a good state. And if you’re not in a good state, be able to roll back quickly and get back to a level, right?
And I think most services are going to have to go through that transition over time. It’s it’s not going to be easy. I think it does put in a tremendous amount of onus and pressure on observability and really actually understanding what you’re dependent upon as a software team and as a software application.
Not all of that comes naturally. I think that’s going to be a learned skill that most teams are going to have to wrestle with. And then I think also from an operations perspective, I still think that that process of letting go is really important.
Which is it’s going to be somewhere instead of going from firefight to firefight, trying to figure out why that load balancer went bad or that Kubernetes pod went bad to your point about SRE, it’s going to be about trying to create a set of rules and expectations and tools. And then making them available and helping the teams adopt them and then use them so that they can support the services that they want to support.
And so you kind of go out of just pure firefighter mode and go into something that’s a little bit more of an enabling stance where those teams can actually run what they built and actually successfully match that. I think the other part though too, is that all applications are, not all, but many of them are just going to behave differently.
There’s still going to be applications that you’re going to deem to be mission critical that are going to have much slower release cadences and much more onerous hurdles to clear as they make their way to 100% of traffic versus things that are less risky, right?
I kind of think about, so we have these things in growth in product in our growth group, which is they call them 15 minute experiments. It takes 15 minutes to build. You put it into an AB test and you put it out into the hands of the user. And it’s usually easy stuff, right?
Or smaller things. It’s like, well, let’s change something in the UI. Let’s change some copy. Let’s change the ordering of these particular elements. But they’re small, easy things to build. They’re great for new engineers that are coming in and being onboarded, but it’s also a great way to exercise the toolset and figure out are we able to track things?
Are we able to monitor? Are we able to observe? And what happens? At the end of the day, you hope that it goes in the right direction. But because it’s only 15 minutes, you can roll it back really easy. You can get back into a good state and it doesn’t cost you anything, but hopefully you’ve learned a lot along the way.
On the flip side, when we’re making fleet level changes, you don’t want to be doing 15 minute experiments and having things fail and having people’s builds just blow up, right? So we have a totally different cadence there and a totally different process to figure out whether something’s going to work.
You watch some of those long running canaries go into production and they might run for two weeks before people feel confident that they can really start redirecting more traffic at that system. And I think teams have to go through that inventory and go through that thinking in that process to figure out the stuff that can move fast, we want to tool it up so it can go as fast as humanly possible.
So we can learn as fast as we can. For the things that might benefit from going a little bit slower, we want to make sure that we have all the monitoring and all the systems in place. So that again, we’re always learning and getting fast feedback, but it might just take a longer incubation. It might have to bake a lot longer.
And people have to go through that process. I think the thing that we found with COVID is that we’ve gotten all of these folks that have never done continuous delivery before, and they’re coming in and they’re reading everyone’s blogs and like, yeah, we should do test in production. We should be doing this, that or the other.
And I think realistically, what people have to do is sort of step back and say, what am I building for the end user? Where are the appropriate guardrails? What is the appropriate process for each level of the stack? And then how can I build it and support it on one platform?
And that’s the thing that we’ve been trying to build is just to say, all of these processes should be built on one common framework, one common point of view, because they’re also deeply interdependent. You can’t really tease them apart in a world where everything was monolithic and running on its own.
Yeah, you could run them all on different fleets, but now that’s just not possible. So I think that’s where you watch a lot of these folks that have had challenged digital transformation efforts in the past. It’s been this point of view of, oh, we’re just going to take everything apart. We’re going to start from scratch.
And then they try and do that for nine months and all those projects fail because it’s like, you can’t take the 20 year old monolith and break it into microservices. You can maybe tease a few pieces out of it and reduce the footprint of the monolith, but eventually the monolith is going to be its thing.
And then you want to wrap it and make it a dependent service and then have all your development happening outside of it. But it’s just not going to go away, but you have to go through that journey and that set of learning. And I think everybody’s at a different point of that distribution and a different point of that journey.
And I think the challenge for us is trying to support all those customers. But I think the market overall is just you’re going to end up there. It’s just a question of how fast and how painful it’s going to be to get there along the way.
Rob: So as we wrap up here, there’s something we’ve talked a little bit about and I think comes out in that you’re saying all these teams trying to figure this out. It’s complex. There’s a lot of changes. One of the things we’re starting to witness is a little more prevalence of this engineering efficiency, developer productivity, developer experience, team person depends on the size of the org.
In a few brief words, what’s your take on sort of why that is becoming so I guess, common and prevalent now? And where do you think that team is headed?
Jim: Well, I think it’s becoming so prevalent now because developers are so expensive, right? Your development team, almost all your costs is in head count and everything necessary to support your development team. And so you want to make them as happy and productive as you possibly can. So I did the math, and so the number that we came up with is that it’s a $1.45 per minute to support a developer in North America.
And that doesn’t include management overhead and everything else. That’s just somebody showing up, turning on their computer, sitting down and doing work. And so the meter’s running. The minute that somebody shows up, you better figure out how to make sure that that money is spent wisely.
And in a world of well, everything is locked down from an operational perspective, we are in a world of controls and gates. Then I’m going to dictate from an operational perspective, what my developers have to use. Here’s the list of requirements. And oh, by the way, here are the tools that you can use to be able to hit those lists of requirements.
But I think what we have found is that as the world is accelerating, the operations teams are letting go and getting into more of an observability stance and more of a stance of education and monitoring. Not monitoring, but observing and development teams are going through that same transition.
But you want your development teams to go off and explore and find what works best for them to a point, right? You want to encourage development teams to go off and look at new technologies, look at new processes. Look at what’s next. What’s over the next hill because they’re the closest to the problem.
They’re the ones who are actually thinking about like, there’s these cool libraries, there’s these cool services. What if we organize our code or processes this way, what would that unlock? And you want to allow for the teams to go off and do that. At the same time, there are just some things that are best practice and can afford to be standardized inside of those organizations, right?
And so when you establish or you identify, well, that looks like best practice. And if we took it out of team one, encapsulate it and made it available to teams two through 50, everyone is going to benefit. That’s when you get these developer experience teams who are trying to create, establish and pull best practice from all the teams, pull it up together and pack it together and make it available for everyone else.
So that those other teams can ultimately be more productive and focus on whatever they’re building or whatever’s making those teams special. So you have to come up with a way of re-consolidating and re-standardizing whatever best practice is.
I think the difference is, and the thing that I always think about is that when you talk to customers, customers are always like, we want to build like Netflix, or we want to build like Amazon, or we want to build like Google, or we want to build like Facebook. And when you go and look at all of those companies, what they don’t do is they don’t consolidate.
The idea isn’t we’re going to jam all of you on one tooling structure and one process, and everybody’s going to be forced to do it the exact same way, because this is the way we do it at big fame company X. It’s, well, here’s the common set of tooling that we have available. Here are your responsibilities.
By the way, if you use our tools, you’re going to meet all of those core responsibilities without having to do anything. But you can do whatever the heck you want. You can go off, and if you’re in a particular new area and we just don’t have a pattern for that, figure out what the next great best practice is.
Just make sure you’re keeping an eye on these responsibilities and then we’ll keep an eye on you. And when you find something that’s reusable and fungible, and ultimately it’s going to accelerate all the other teams we’ll take it and bake it back into the cake and make it available for everyone.
And that’s, I think why we’re seeing that practice and those teams come up because it’s the only way that you can generate the leverage that you’d need across these big, expensive, but incredibly strategic R&D organizations. And so, yeah, I mean, Netflix talks about the paved road, right? And that’s the idea.
It’s, we’re going to pave a road. We’re going to make it super easy for you, but we’re not going to prevent you from going off-road. If you want to go off-road, just bring enough water, make sure you have gas and air in the tires. And you should probably get back on road at some point, but God bless you, go do your thing and we will help support you over time.
Rob: Yeah, I love that message. I love that theme. I could talk about developer productivity all day and all week and probably anyone listening to this knows that we do. But I’m going to wrap it up there. I will take out of that. First of all, don’t just copy the big folks. They probably have different problems than you have.
And second pay real attention to what they’re doing and how they’re thinking about it. Don’t just take a snippet and try to use that as your solution. Awesome messages. Thanks for joining Jim. Always a pleasure to chat and thanks to everyone else who joined and listened. If you like what we’re talking about, share with your friends, subscribe on your favorite platform.
And if you want us to talk about something that you’re interested in, maybe something we do, or just want to hear me talk to someone else about something in our space, then hit us up on Twitter at CircleCI. Thanks again. Thanks Jim.
Rob: Have a great day.