From Code Cowboy to Infrastructure Architect
Senior Software Engineer
Note from the publisher: You have managed to find some of our old content and it may be outdated and/or incorrect. Try searching in our docs or on the blog for current information.
5 things I wish I’d known as a developer
There’s a difference between understanding how to write an app, and understanding the infrastructure that works underneath it. When I transitioned from a developer to an SRE (site reliability engineer) this year, and saw how distributed systems worked at an actual company, it really changed the way I thought about writing code.
When you’re doing software development exclusively you’re not thinking about how it’s going to be held up. When I deployed my first website, I had no idea how to lay this down. I didn’t care much about writing tests. Let’s keep it real: I just cared about adding new features and seeing the end product. I didn’t understand how to host static images, or configure a server.
Here are some things I wish I’d been more focused on when I was just working on development:
1. Writing accurate commit messages & clear documentation
Make sure that your code is efficient and commented before you put it up. Clear commit messages make it easier during your code reviews for people to understand what you were doing, and helps the debugging process if something’s not passing. I write my commit messages file by file; that gives the code reviewer a chance to review what I added and why I added it. Before, I was just trying to go fast, but now I realize it saves more time to do it properly upfront, the first time. When that stuff goes up on GitHub, and you don’t write clearly what you did, it leads to a bad review session and problems that take much longer to fix.
2. Writing focused tests
There was a tweet I once read that said “your favorite company is 30 minutes away from a total shutdown”. What that means to me is that the companies we like are not as indestructible as we think they are. It takes a lot of effort and teamwork to keep them running; it’s hard to run a company. You start to realize at an infrastructure level just how badly things can go wrong. One way to help prevent that is by writing good tests. Your tests should be deterministic, and clear and easy to read. If you test your stuff well, you’ll prevent regressions that can be the result of changing code, or larger incidents.
3. Ensuring my code is secure
I learned about a lot of security aspects that made me more aware of code’s lifecycle. Anytime you’re putting something on the internet, code integrity becomes really important. You could be putting not only your customers at risk, but also your company’s information at risk. When people use your applications, they trust that privacy and their information is being secured within your application. Without having security measures in place, you don’t have your customer’s confidence. After the Equifax breach and its widespread effects, I realized how important it is when you’re dealing with large codebases to make security a priority. All good design takes security into consideration, whether it’s architecting a house that is earthquake resistant or creating strong software – good design means prioritizing security throughout every part of the code.
4. Time management
Being on-call as an SRE has forced me to improve my time management skills. It’s improved my life a lot too. I always want to do a lot of things at once, but know that when I’m on-call all my attention is for keeping the site up and everything running smoothly. So managing my time to do my work projects is more important than ever. Now I’m using new tools like Evernote to organize my thoughts and using task lists to make sure my tasks are complete. I make a list of tasks for the week: that’s how I get into the groove of my work week. Throughout my day I cross the tasks out and if I don’t finish them, I add them onto the next day’s page. I’m more organized with completing my projects, and that way when my on-call shift comes, I can keep track of what I did not complete the day before, this means I can balance my on-call shift with my development projects.
5. Being attentive
I’ve learned the importance of maintaining clear attention during my on-call shift as I always have to be focused and ready to react. I can’t get distracted by other things if I’m tasked with maintaining the infrastructure and security of our product. My team is relying on me in case something goes wrong. My ability to respond is important because if I can prevent an incident from escalating, it helps people higher up the escalation levels to keep their attention on the big things they’re working on. I achieve this focus and attention by using the smaller task successes to motivate me with the larger ones like a snowball rolling down the hill.
I think more like an architect now. Now I’m thinking: how does this thing scale? I’d never really considered that before. Becoming an SRE has pushed me to understand how our distributed systems and infrastructure work as a whole, and given me more insight into the results of my decisions.