Observability improving speed and reliability with Ben Sigelman
Sep 3rd, 2021 | 42 minutes
Ben Sigelman is a co-founder and the GM of Lightstep, a co-creator of Dapper (Google’s distributed tracing system), Monarch (Google’s metrics system), and the CNCF’s OpenTelemetry project. Ben's work and interests gravitate towards observability, especially where microservices, high transaction volumes, and large engineering organizations are involved.
Rob Zuber is a 20-year veteran of software startups, a four-time founder, and three-time CTO. Since joining CircleCI, Rob has seen the company through its Series F funding and delivered on product innovation at scale while leading a team of 300+ engineers who are distributed around the globe.
Rob celebrates CircleCI's 10 year anniversary with the company's longest-tenured engineer, Gordon Syme, and one of our newest employees, JP LeBlanc, to discuss the company's past, present, and future.
Charity Majors discusses creating Honeycomb, business building practices, and the importance of proper CI/CD and monitoring. Giving us the latest insights on observability and the necessities for engineering team success.
Rob Zuber: Hello, and welcome to the Confident Commit, a podcast for anyone who wants to join the conversation on how to deliver software better and faster.
Rob Zuber: Listening to episode 10. And today, I’m talking to Ben Sigelman, co-founder and CEO of LightStep, or possibly former CEO now GM of LightStep, which we’ll get to in a little bit. Today, we’re talking about observability, how it connects with delivering change with confidence. I’m your host, Rob Zuber, CTO of CircleCI, the industry leader for all things, CI and CD. Ben, thanks so much for joining me today. It’s great to see you again.
Ben Sigelman: Yeah, you too. Thanks for having me, Rob. I’m looking forward to this.
Rob Zuber: Right on. So, let’s talk LightStep for a little bit. Let’s talk observability and all the things that go around that. Can you just give me a quick summary of what LightStep is and where you focus?
Ben Sigelman: Sure. As I was saying to you before we started recording, Rob, I’ve considered this being 99% about being interesting and useful and 1% about selling stuff. So, I’ll keep the pitch really quick, but LightStep is an observability provider that’s designed for cloud-native environments. We started Life focusing on tracing although at this point we do a lot more than that, and we are open for business. If you want to buy something, you can go to our website, but let’s talk about observability, so.
Rob Zuber: Yeah. Well, let’s do that. So, you just said right there, we started out focused on tracing, and so I think there’s logging, there’s monitoring, there’s tracing, there’s observability and maybe some of us or maybe it’s just me are a little confused about where all the pieces fit together. So, why don’t you just give us a little sense of that in terms of is this a natural evolution? Is it an umbrella of a bunch of different things that we do? Are we heading into new territory? Like how do you think about observability as a whole?
Ben Sigelman: Yeah, it’s a really good question, and I think I’ll start by saying that if you pick your favorite 10 people who like talking and writing about observability and then put them in a clean room and ask them to define observability and then compare notes afterwards. I think you’ll find that you have a lot of divergence on what observability is for, what it is, what it’s made out of, that sort of thing. I will say a few things I feel really strongly about. For instance, the observability is absolutely not the presence of logs, metrics and traces. Those are just the raw materials that are… None of the three is a value proposition or a problem to be solved or anything, it’s just data. So, observability has to be about something that actually solves a problem for an organization. The way I think of the space… Since I know this a podcast, I’ll resist the urge to show a slide that describes this, but there are really three layers.
Ben Sigelman: At the bottom layer, you do have the telemetry, the data itself, and observability without high-quality telemetry doesn’t work, so that’s definitely important, but getting that data is really just step one in the process. And once you have the data, you have to put it somewhere and then do something with it.
Rob Zuber: Mm-hmm (affirmative).
Ben Sigelman: [crosstalk 00:03:19] putting it somewhere is an area where there’s also a huge, huge amount of innovation that needs to happen. Right now, the data stores are often siloed depending on what type of telemetry you started with. So, I have a data store for logs, a data store for time-series metrics, and a data store may be for traces if you’ve gotten there. That’s a very fundamentally problematic way to build things because that forces user experiences to accommodate those silos. There is a future where they converge and also a future where the data layer is more configurable from a value standpoint.
Ben Sigelman: I think a lot of folks who are doing observability find that as very expensive, and that’s not because people are trying to rip you off, it’s just because there’s no control for most observability end users over the ROI knobs for the data. We can talk about that more if you want. It’s kind of an interesting subject actually. And then finally, we get to the benefits, like why does anyone do observability? And there’s really only two benefits to observability. One is to improve reliability and the other very much in the kind of CircleCI world is developing new functionality faster and getting it out there faster, right? So, you’re either improving velocity or improving reliability or both. Of course, the easiest way to improve velocity is to stop caring about reliability. And the easiest way to be reliable is to stop caring about velocity, right? So, observability is something where it can allow you to do both. So, it’s about planned changes and unplanned changes and those two worlds.
Ben Sigelman: And then monitoring is an important part of observability. It’s not being replaced by observability. I’ve written kind of extensively about that. Monitoring is just a subset of the observability problem. It’s where you find things that you know have business risks and you monitor them proactively so that you can get ahead of issues, but observability is what makes that monitoring actionable, again, whether you’re making planned changes or reacting to unplanned changes. So, that’s really how I see it. There’s those three layers and for better or worse, I think the industry is still very early on in the process of building out that entire landscape, and it’s just starting to converge in the past couple of years.
Rob Zuber: So, you covered so many points in there. I’ve got like probably questions for the rest of the podcast episode, I’m excited. But let’s talk about reliability first. And I really appreciate the interplay between reliability and velocity and being able to sort of turn that upside down, I think is really interesting, but let’s take it from the point of reliability for a second. And when I hear improving reliability, I usually think that’s making my software more resilient or whatever. I guess there’s a lot of pieces if you just look at unavailability number, there’s a lot of things that go into that, right? When I have an incident, how much of that time was spent in different areas? Or how many of those do I have in the first place? Like what caused them? And so, when you think about the role that observability plays specifically in increasing reliability, are there areas of that… There are things that come to my mind, but how do you think about the overall sort of improvement of reliability and what observability really does for you in that space?
Ben Sigelman: Great question. I guess I’ll go back to something I said before that you have to both be reliable through planned changes, through intentional changes and through unplanned kind of reactive incidents, right? Those are the two areas where reliability ends up being a problem. And observability has a role to play in both. Let’s talk about them separately. So, for planned changes, I guess underneath all of this, I’m assuming that we’re talking about a large application that probably has dozens, if not hundreds of separate teams developing functionality. If that is indeed the case, then you have… And especially if people have successfully adopted CICD and so on and so forth, which I think is the whole point of modern development practices to allow for that kind of velocity.
Ben Sigelman: You’re dealing with a world of constant change, like a lot of LightStep customers… I think I’ve written about this publicly, so I can share this like a Twilio or a Spotify or something, which are both LightStep customers will develop software such that they make intentional changes, many, many thousands of times per month, right? So, you’ll have like five or 10,000 deploys per month in some of these applications, right? Which is really like a tremendous amount of change, but the thing that’s so hard about it is that the purpose of these architectures microservices, et cetera, was to allow for those changes to happen independently. And while the teams are able to deploy independently, that is true, the application is one application, and so in that sense, all of these many teams are totally codependent, and yet they’re not even aware of each other. They oftentimes don’t know each other’s names, much less understanding how their software is connected.
Ben Sigelman: So, every one of those changes introduces some small amount of risk and you don’t have to spend very long with the math to realize that even if you only have like a 1% risk per change, if you make that many changes per month, you know that some of them are going to go wrong, right? And that has side effects that often affect other teams. So, the reason observability is so difficult and so important is that it’s the way that you can navigate across service boundaries and across team boundaries to help connect those two people, the person who made the change and the person affected by the change to reach a resolution. And if you have to do that manually, it’s just not effective, right? And monitoring would be doing it manually, like you can find out there’s a problem very easily, but to connect those two teams requires a lot of automation and data engineering, which is where observability comes in. So, does that make sense?
Rob Zuber: Yeah, absolutely. Absolutely.
Ben Sigelman: For the reactive case, it’s just almost like you take that same thing, but flip it over. So, instead of thinking about the person making the plan change, it’s like now it’s the person being woken up. The change, unfortunately, it might not just be one of their colleagues, it could also be one of their downstream cloud providers, or honestly, it could just be an end-user, like you end up with, especially B2B SaaS companies will often have programmatic users that can change their behavior by a thousand decks overnight. I wouldn’t be surprised that CircleCI had a few worst stories from a customer that suddenly did something really, really unexpected, and that can create a lot of contention in a multi-tenant system, right? So, uncovering which specific customer changed their behavior and created a choke point, that’s a really hard problem to solve if you’re doing it manually. And observability needs to take the signal you’re looking for like an SLO or something like that and trace it back to some change that was made either by a customer, a downstream service provider or another team, but it’s really just unwinding the same multi-service, multi-team cause and effect story through data engineering.
Rob Zuber: Right, right. Yeah, that makes a ton of sense also. We spend a lot of time thinking about change at CircleCI and surprisingly, but we tend to be more in line with that or the thinking around, I think what you’re calling intentional change, at least driving change through software which for a good part of my career, let’s say was the majority of where your change came from. Whereas now, it feels like it’s a tiny fraction of where your change comes, right? In terms of third-party services, you’re using third-party libraries, the amount of our own stacks even that we ran ourselves. So, even the impacts of something I may have pushed out, but it came in the form of a library that I upgraded and being able to trace that back and understand, “Oh, okay, this is happening.” Or some new data came in or to your point, some customer of mine just decided that they were going to do a lot more this month than they did last month and started doing things in maybe unexpected ways.
Rob Zuber: So, I think a lot of that speaks to just complexity, our overall, our ability to reason about the systems that we build, how they interact with other systems. And I guess from the perspective of observability really surfacing that, right? Giving people tools to get their head around things maybe at a higher level of abstraction because the system is so complex that I might only understand one part of it in any depth. Do you think that on the whole, we are increasing that complexity generally for the good? I guess, is the best way that I could ask that question. I see a lot of increasing complexity and we’re building a lot of tools to manage that complexity, but are we primarily solving problems and then just dealing with the repercussions? Or do you think there are areas where maybe we’re overdoing it?
Ben Sigelman: Yes.
Rob Zuber: All of the above?
Ben Sigelman: No, definitely the latter is true. If you look across the industry as a whole… I don’t know, I’m not the first one to point this out, but I love the CNCF, but if you go to the CNCF’s landscape and just literally just look at it, it’s the landscape of all cloud-native technologies. And over the past two years, since they started tracking that stuff, it’s gotten the point where you quite literally, even in a Retina display cannot read a lot of the logos, because there’s so much stuff on the screen. Right? So, in the macro sense like, “Oh my God, yes, a thousand times over, there’s way too much complexity.” That said, there are organizations that I think have found a way to create a paved path approach and have the proper, not just technical, but like human enablement for their team so that you have this prefixed menu of how you can do things at such-and-such organization.
Ben Sigelman: And that can be a relatively sustainable reasonable experience even in the face of hundreds of microservices. It just a question of how much freedom you give people. I think I did a talk about this a few years ago, but I kind of talked about it as like hippies versus ants, right? I love hippies and I some level, I think of myself as a hippie deep inside. But if you want to have distributed teams and operating independently, be careful that the independence doesn’t include letting them make all of their own decisions and have total, total freedom or autonomy, or you’ll end up with as many different tech stacks as you have service teams, and that doesn’t work. It’s more like ants where you do want to have sort of independence, but a lot of regularity and uniformity and the tools that are chosen in the way that people do things so that there’s at least some sort of hope that you can get a relatively uniform surface for observability and security and other cross-cutting concerns.
Ben Sigelman: And you see organizations… I will say, ironically, I think a lot of people think of digital-native companies as being better at this. Sometimes the true enterprise companies are better at command and control, and having a set way of doing things and actually can create, I think a more fertile ground for successful distribution of work because they have the managerial muscles to do that, right? So, I think we’ve kind of confused the idea of distributing service deployment with distributing every decision related to your service. And those are completely different levels of scope, and that’s where I think all the complexity comes from on a per organization basis. Certainly in the ecosystem in general, it’s just way beyond the pale, but there are organizations who’ve cracked it. And I think what they did is limit the number of tools people can use within that organization.
Rob Zuber: Right. I love the analogy. I often talk about or think about sort of optimizing for the team versus optimizing for the organization, right? So, to your point, I think, I guess if I were the hippie… And you could tell me if this is a good interpretation because I probably wouldn’t describe myself as a hippie. I’m thinking about my own personal freedom and what will make… Or what will create the best environment for me, right? And so, okay, we are the small team and we will move quickly and do our piece most effectively if we are constraining or making decisions completely within our own team. But that creates overhead for the organization for that collection of teams and finding the right balance there, I think is hard. And to your point, from an enterprise perspective, I guess every new freedom feels like a positive, right?
Rob Zuber: So, if you sort of give folks of you and say, “Hey, here’s where you have space to deliver on your own, to not be coupled to other parts of the organization, et cetera.” But that feels like a positive and that step forward versus sort of your… I’ll just describe it as your typical startup or smaller tech-forward organization where everyone’s just making decisions all the time. And then it feels more like you’re putting limiters on that to start paving those roads. And it feels like you’re taking things away, right? So, it’s partly, the managerial muscle and partly, kind of just maybe background or context of where you’re coming from in a way.
Ben Sigelman: Yeah. And I can update my metaphor [inaudible 00:16:34] well as libertarians. Maybe I should change it to libertarians and terminates or something like that. But yeah, I think you described it perfectly well. It’s just about… I love making decisions and I love giving people authority to make their own decisions, but just that I think people get so fixated on the Layer7 API that they’re satisfying and feeling that’s the only thing that matters. And it kind of does, but it also kind of doesn’t when you start thinking about the organization. I think the way you put it [inaudible 00:17:00] Rob it’s, yeah, thinking about the organization. And to a certain extent, the organization needs to lead with some practices for everyone to adopt and if that’s done correctly and done well, then I think things can really flourish, and even at scale and with a lot of intrinsic complexity in the application.
Rob Zuber: Yeah. Yeah. That makes a lot of sense. So, we got there from reliability, but I think we landed pretty close to the other half, which is the velocity piece, right? So, a lot of this, I’m going to choose the stack that I’m comfortable with. I’m going to make decisions about how I’m going to build things is really in the name of velocity. And again, there’s optimizing velocity for a team versus optimizing velocity for the organization, right? Like what are we ultimately doing in service of our customers, but how does observability then directly play into velocity in terms of… I’ll define it a little bit so we’re not just talking about story points, but truly delivering customer capability or business value. Let’s call it business value because not every team is building customer features or whatever. How does observability really help me get there faster?
Ben Sigelman: Yeah. It’s a good question. And this is an area where if I’m being totally open, I don’t think observability has delivered on its own potential here. We’ve been working a lot on this within LightStep, but let me tell you where things are at right now. I think at this point, it’s not automatic, but it’s relatively easy to set things up so that CI, obviously, you can make sure your test pass and maybe you can run to a staging environment to validate some basics and then CD, you can also run so that the service you’re deploying, you can validate that your service is healthy before blessing the release and letting the automated process continue, right? Great. So, definitely, that covers a certain category of failure that can occur.
Ben Sigelman: The thing that we have not yet done, but I think is absolutely possible in which for what it’s worth, I think we’ve done some great work on at LightStep is to allow for a deployment to succeed locally, but fail globally. Like you should be able to tell that automatically, but it requires an observability platform that’s able to see cause and effect across the boundaries of services, not just through time correlation because that can lead to a lot of false positives, but actually saying, “Well, the transactions flowing through this new version are creating an SLO violation way up in the stack or way down the stack.” And that’s something that you can determine in a very literal way. And then you say, “Actually, well, this is not a good idea to move forward this release. So, we’re not just going to roll it back, but we’re going to collect enough forensic information along with the rollback to allow the developer to figure out what went on and to remediate the issue before they roll forward again with a new update.” Right?
Ben Sigelman: So that vision is much different where you’re not just looking for failures in your own service, but you’re looking for failures that are upstream or downstream, and although it’s possible and that the technology exists, it’s not something that’s widely practiced today. And I do believe that that unlocks a level of confidence and a level of accuracy that is presently just missing fully from a lot of organizations and would lead to much higher velocity and practice.
Rob Zuber: Yeah. I really like use one of my favorite words in their, confidence, but I think the correlation between confidence and velocity is quite strong in the sense that if I know that I can put something out into a production environment and I can constrain its impact while I see, I guess the effect on the rest of the right, right, because I’ve pushed it out to a small number of users or I’m getting some sampling of it or whatever that might be, then the risk of pushing something in production comes down significantly, therefore allowing me to honestly, to move faster to make decisions, “Okay, well, this is an acceptable risk for this part of my product. Maybe it wouldn’t be in this other part of my product so I can move faster, see how things are going.”
Rob Zuber: I think the enemy of velocity is really that deep amount of… I was going to say care, I would be disappointed in myself, we’ll call it trepidation like that. Am I really comfortable putting this out? But if I feel like the systems are in place to allow me to have that input and feedback, then I’ve mitigated, manage the risk in an effective way that allows me to just keep moving. And I think we’re at a point in the life cycle of software as a whole of this industry, I’m not sure we’re being able to move quickly feels like the thing we’re all talking about. And so, that’s a business differentiator, and so it feels like it’s a really important piece of that.
Ben Sigelman: Yeah. To get back to your point about complexity from earlier though, and this is not a problem observability solves on its own, but the level of mastery that’s required to actually diagnose complicated failures in distributed systems right now is unreasonable, I think if we want to take someone who’s less than five years into their career and expect them to be able to manage these incidents. And I think we see a lot of organizations where having your system healthy requires sort of the unofficial, permanent on-call status of eight or 10 super senior old-time, or principal engineers who know where all the bodies are buried and stuff like that. So, there is a need to get to programming models that allow for less experienced… I don’t mean dumber to be clear, I just mean literally less experienced people to successfully diagnose things, right?
Ben Sigelman: We have succeeded to a certain extent. It’s not necessary these days to be able to understand assembly code in order to diagnose issues, right? Like at some point, you don’t need to understand what goes on below a certain level, but it’s still a pretty broad like swath of the stack that you need to be able to really master in order to diagnose modern failure modes and observability is helpful in all of this, but we also need to see, I think programming models is the word I would focus on that are both robust enough to handle modern applications, but simple enough to be debuggable by a relatively untrained practitioner.
Rob Zuber: Yeah. That’s a really interesting train of thought. Do you have like particular, I guess, models or frameworks that come to mind as you [crosstalk 00:23:59]?
Ben Sigelman: Well, indeed I do, Rob.
Rob Zuber: I can tell there’s something going on over there.
Ben Sigelman: Well, it’s a point of frustration for me. I’ve been thinking about this a lot lately. Again, this is beyond the scope of observability, right? But I think it’s relevant to the audience. I think we’re getting somewhere, like some of the stuff that’s going on with Kubernetes and microservices is pointing in the right direction, but at the end of the day, like you’re still dealing with processes that exist between transactions, and that’s hard to avoid. There are certain applications that have moved to like a truly stateless serverless model, but that’s the example in my mind. If a much simpler programming model that’s so simple that it doesn’t actually perform that well for a lot of applications, either in terms of latency or in terms of throughput/costs, it just isn’t really effective.
Ben Sigelman: And so, people are still trapped writing… You can dress it up however you want for packaging, but they’re basically running a Linux process. I do think that it’s going to be necessary for a long time to come to have a certain amount of state live on between requests, even if it’s just local cash and stuff like that, but something that looks… I guess the closest thing I can think of are like GRPC service definitions, which are a lot more robust than just a serverless function definition, something like that plus some primitives for managing local cash and some local state, that seems like the right set of materials in my mind to define a virtual machine that lives on from request to request and thus it’s like performing, but also can do enough work to replace your sort of average microservice, right?
Ben Sigelman: That’s always going to be a need for specialized things, databases, and stuff like that, that don’t fit that model, but for the average service, I still feel like we’re operating too close to just running a plain on Linux process. We’ve solved a lot of this admin stuff, but the process itself still looks a lot like a Linux process. And I think that’s the thing that leads to all of the expertise for better or worse.
Rob Zuber: Right. Yeah. There’s a couple of things that come to mind and there, one you mentioned data stores. I feel like that’s… I’ll just say it’s a constant source of frustration. For me, I feel like it’s an area where we’ve made some shifts, but they’re sort of dressed up packaging around the things that we’ve always done. My favorite example is actually query optimizers, because as a… Like this is not a work group shared database, right? I know exactly what I want the database to do, but I’m hoping that the database is going to choose to do it in that way. That doesn’t feel quite right to me, but so there’s that which is probably an area of specialization that would be great if we could shift people away from, but I think we’ve shifted them away by taking away knowledge, but not necessarily actually creating an… It’s a leaky abstraction, let’s say that, right? Because at some point, your database is going to do something you didn’t expect and then you have to learn a whole lot about it very, very quickly.
Rob Zuber: And then I would say probably concurrency would be the other thing as you’re talking about just processes and having to actually understand that these things are working together, there’s limits on how much capacity there is in a particular system, all things that tend to get determined very late in the game, hopefully with at least great observability about what’s happening, but it’s interesting to think about how you could truly abstract programming away from the system in a more… I guess, I want to say less leaky, but it’s bigger than that. Right? Truly in a way that allows folks to not have to think about those things. And maybe that is in line with what you were describing about orgs that have paved the paths or paved the roads well, right? So, that you can truly work quickly on something without worrying about some of these constraints.
Ben Sigelman: Yeah. I think that is essentially what I’m getting. I arrived at that train of thought. I’ve been doing a lot of thinking about these two concepts of resources and transactions where transactions, of course, are just the things that we do for our customers, right? Whether they’re API transactions or end users on their mobile app or something like that, those are transactions. And then the resources that the things that live on between the requests and by definition are finite. And the trouble is that we’re all trying to be customer-focused for good reasons, okay? I don’t need to explain that. So, that’s transaction world, but the whole point of software is that you have a resource that is doing a job that exists, not just between transactions, but across many at a time. If you’re not sharing your resources between transactions, then your business sucks and your margins are terrible, right?
Ben Sigelman: So, our operators can only operate on resources. That is literally the thing that they can do. That’s why all of our dashboards are still just like, despite everyone knowing it’s sort of not the greatest practice, it’s always just a bunch of infrastructure metrics for your resources. That’s what people are doing all day long, from an operational standpoint, even though we’re supposed to be focused on transactions. And SLO is something that actually bridges the two concepts where you have a contract between the set of resources instead of transactions, that’s like what an SLO actually is. But if you subscribe to this idea, you can have resources that exist and have unique identifiers that are well below the scope of a process like a MuTech slot to use that example for a concurrency primitive. And it is very possible and we’ve done demos about this too and it works. You can have software that will determine that a single customer and you can detect the customer ID and with minimal overhead in production, that’s creating contention around a specific MuTech slot that’s creating latency for some other set of processes.
Rob Zuber: Mm-hmm (affirmative).
Ben Sigelman: That kind of thing is possible, but we need a lot more discipline about, again, the software PIM devs need to identify resources. And my purpose and talking about this in a podcast is not to say that the stuff is easier and it’s going to be available today, but it’s totally feasible if we could just get the ground to firm up long enough to address some of the tagging issues that are actually at the core of this. It’s not like a hard technology problem, it’s mostly just like determining what things are called, how we assign unique IDs to resources, and how the hierarchy of resources maps to the workload of transactions. And then we could easily do this sort of diagnostic stuff automatically, even down to the level of individual locks and cues.
Rob Zuber: I really appreciate that expression that you just use. You’re letting the ground form up long enough. Do you think we’re in sort of a mode of change for the sake of change, adjusting or changing so much about how we build software, that our ability to bake in really good tooling is sort of lost, because we’re always behind the evolution of the great new framework, the great new programming model, and actually not getting to the point where we have the stability to have implemented great tooling? If you look at a set of stacks, I guess, you’ll probably find more mature tooling in the older ones than in the newer ones, at least back to some particular point in time. Does that make sense?
Ben Sigelman: Totally makes sense. I think the driver for a lot of this is that despite any of our wishes or even intentions, the reality is that the economics of software have changed, not just like a little bit, but changed violently, every five years since the beginning of the industry. And it’s very difficult if the entire ecosystem is migrating to some new set of technologies to avoid a whole bunch of collateral damage to other processes and tooling. And I think the CPU side is starting to stabilize a little bit, for better or worse, right? And we’re not seeing that same doubling that we saw for so long, right?
Rob Zuber: Mm-hmm (affirmative).
Ben Sigelman: But you have other aspects like network economics are changing so quickly that it’s moving the trade-off between CPU and network. Obviously like SSD has did a lot to change things a few years before and so on and so forth. And every time that happens, you have to rearchitect the application, and then everyone is suddenly very grumpy with their whole toolchain. So, you’ve got the situation where for any customer, for any set of technologies, you can say, “Hey, what are you using to do X?” They’ll tell you, and then you say, “Are you satisfied with that?” And they’ll say, “Absolutely not.” And so, it just creates sort of like infinite opportunity for disruption of tools and marketing messages that sound really good, and then end up also being somewhat disappointing.
Ben Sigelman: Maybe if everyone could compare notes and see that all of these things have problems, there wouldn’t be so much desire to change tooling, but I think there’s like an actual fundamental disruption of the economics combined with a lot of dissatisfaction about a very fragmented toolchain, and that’s been true really without exception for the last 30 years.
Rob Zuber: Oh, that’s, I think mildly terrifying is the best way that I can describe it.
Ben Sigelman: [crosstalk 00:32:58] It’s not all bad. We are making progress. I would definitely prefer to develop on today’s stack than the one we had 30 years ago.
Rob Zuber: Absolutely.
Ben Sigelman: Genuinely and without hesitation. It’s a really hard problem, and there’s a lot of urgency. And to be a little bit less doom and gloom, this COVID stuff is really bad obviously, but the only reason that the commercial universe continued to function was that we actually have solved a lot of problems with software. If it wasn’t for the software, we would have been totally screwed as a species, I’m not just talking about economically and it actually kind of held up, right? So, that’s great. So, it’s not like it’s all [inaudible 00:33:39] or something, it just really hard to plan in an environment with as much kind of constant change.
Rob Zuber: So, one thing that comes to my mind that I’m curious about your perspective on base and all that is I’ll call it… I don’t know if I could make a slash, vocally, I guess I could just say slash, like low-code/no-code, the whole which is a bit comical to me because that was the promise when I started programming also, but that shift towards, okay, we have higher and higher-level building blocks, how do we string those things together, I guess, is there actually a future in which we really reduce the complexity or make it possible to build rich applications without having to understand systems? I guess, number one. And number two, are there things on that path that are really blocking us or holding us back in your mind?
Ben Sigelman: Oh, my God, I’m going to just resist the urge to rant about low-code stuff, but I’m going to be productive. So, I think that it is absolutely ludicrous to imagine that low-code/no-code solutions will be used to replace conventional software engineering for… And this is the critical thing, for strategic customer facing revenue-generating applications. It’s an absolute non-starter. However, I do think that low-code and no-code will always have a place in improving business processes. And if you’re talking about beating Excel, totally feasible. There’s lots of ways to beat Excel and that is programming environment for sure, there are formulas and it’s not very robust. So, I think it’s the question of if you’re talking about net new software applications that simply would not have existed before, because no one could figure out how to compile and roster something, we can definitely use low-code/no-code to solve those problems.
Ben Sigelman: But I don’t think it’s going to happen is I do not think that people who are presently using what I would… I guess I’ll just be judgemental about it, using real programming languages to solve business problems, I don’t see those used cases being replaced by a low-code/no-code solution in any sort of common case, right? But it does open up a whole new field of potential applications and new jobs and so on and so forth, which is great, but I think of it as a new market, not like a change to the existing market for development.
Rob Zuber: Right. That makes a lot of sense. I think that the new market… I don’t know what else, but like internal tools and processes and all these things that you think, “Boy, I wish we could do this in a more automated fashion, but who’s going to pay software engineers to build this bespoke solution for us?” But if I could sort of [inaudible 00:36:21] and it’s very similar to problems other people have solved maybe, but I’m pretty sure there won’t be a lot of self-driving cars built with low-code/no-code solutions anytime soon.
Ben Sigelman: I’m not getting any one. That’s for sure.
Rob Zuber: Yeah.
Ben Sigelman: Yeah. I think about this a lot. LightStep was acquired by ServiceNow which has been awesome for what it’s worth, and ServiceNow has done so much to advance the state of internal operations across the board and [inaudible 00:36:46] very dirty brownfield problems. They’ve entered some very brownfield areas and done a lot of good by bringing some order to that chaos, right? And in that world, I think low-code/no-code have a very bright future. When you’re talking about the customer-facing revenue-generating part of the business, which is where the sort of second digital transformation is actually happening, that’s where I’m less certain and also where LightStep is focused, right? So, I do think that there’s something very compelling about combining those two, taking the very high-code, yes-code world and connecting it into sort of the brainstem of the business itself and making that data available to analysts and people who are building stuff on the back office, that seems very compelling to me. But I just don’t think that low-code/no-code is going to be customer-facing for anything, but the absolute simplest most trivial applications.
Rob Zuber: Right. Well, I find that’s… We’re a little bit off-topic, but that’s why I’m here.
Ben Sigelman: [crosstalk 00:37:44].
Rob Zuber: I find that particularly fascinating, and so to try to bring it back around, now, I’m asking myself the question. As you create these opportunities, right? You have simpler applications, ability for more people to build those because it’s sort of gluing pieces together, is it sort of really important to bring the kind of operational tooling that we apply in other places because I’m stringing pieces together and I may not even know the implications of that? Or is it less important because I have a bunch of pieces that are sort of off the shelf, their behavior is well understood, and it’s unlikely that this is going to be where my issues are?
Ben Sigelman: Gosh, it’s such a good question. I have to admit that the kind of no-code/low-code stuff, it’s so early that it’s hard to tell like what the role of tooling will play in it, but I’d like to tell myself that for that world to actually work correctly, things need to be simple enough that they just are correct or incorrect, more like math and like science, if that makes sense where these programs, they just work or they just don’t work. But if you end up in something where you have to worry about like contention and load and latency, forget it. And that’s the world where this sort of tooling that I think about all day long is more relevant, but it’s an open question. And I’d actually I’d love to know the answer to that when we can look back in 10 years and see what happened.
Rob Zuber: All right. We’ll check in. We’ll put that on the calendar. Awesome. So, you mentioned in there the service dial acquisition and some of the things that you’re doing there. To wrap us up here, I’m interested, how has all of your thinking about… You’ve been working on this for a long time, thinking about this space for a long time. Clearly, I have some very deep and passionate thoughts about the space. How has that changed as being part of a larger organization, sort of being part of a suite of tools, is there anything that you’re thinking about sort of fundamentally differently as a result of where you are as a business now?
Ben Sigelman: Yeah. So, this has been really interesting. I probably shouldn’t go into much detail, but we’ve been approached many times over the years about selling the company and I never did it because for me, it’s always been about mission and stuff like that. The reason that ServiceNow opportunity was so appealing to me is that as a business, first of all, they’re just phenomenally effective. They deliver a lot of value and the way they execute internally, it is like a joy to behold how well they operate. And it’s just like fun for me personally and fun for my team to be a part of that, but the larger reason was just that I still think of observability as being almost like buy-in for engineers right now and often by-in for pretty sophisticated experienced engineers.
Ben Sigelman: And I think the insights we’re getting out of these customer-facing revenue-generating applications should become part of a larger story and our mission. I know that you have confidence in your mission, but we do too, so it’s to create confidence and clarity for the teams that are delivering the software that powers our daily lives. That’s what we’re all about. And I think those teams go well beyond development. And I love the idea of taking the insights that we can observe in many cases from an automatically observed in these customer-facing applications and make those available and useful to the entire enterprise. And that’s something that standalone would have taken us five to 10 years to get to that. And at ServiceNow, we’re talking about doing it in a couple of quarters, right? So, it just really exciting for me to have observability be more broadly applicable, maybe not in the sense that people are going to LightStep UI, but the data that we’re gathering is something that can just be put into the same brainstem that service has already built for many of its just 7,000 enterprise customers, right? So, I find that to be really exciting from a product and a value delivery standpoint. And that’s what’s motivating about it for me.
Rob Zuber: Right on. That’s super cool. I think that to your point, having a mission, really being motivated to achieve something, and then being able to apply it sort of over overnight… I love the overnight is a great expression for everything, but to be able to take that and bring it to so many customers, to so many operations and really see it get that next level of growth must be a very, very cool experience. Congratulations on that.
Ben Sigelman: Thanks.
Rob Zuber: Awesome. Well, thanks for joining me today, Ben. This has been as awesome as I anticipated. Love your perspective on so many different things. Thanks to everybody for tuning in. If you enjoy this podcast, share it with your friends, share it with people who aren’t your friends. I don’t know, share with everybody. Subscribe on whatever podcasting service you use. And if there’s something you want us to talk about, someone you want us to talk to, hit us up at CircleCI on Twitter. Thanks again, Ben.
Ben Sigelman: Thank you, Rob. It’s been a real pleasure.