Last year I laid down my thoughts about a role within the DevOps ecosystem in What is a CI/CD Engineer. In that post, I shared my rationale for why this role should exist, along with my some of my initial ideas on the characteristics, duties, and skills that CI/CD Engineers should possess.
Since the original was published, I’ve had many engaging discussions with developers and DevOps professionals about my take on this role. These interactions and conversations provided me with loads of feedback. I was happy to learn that it resonated with the majority of folks that I spoke with.
In this post, I’ll refine some of the ideas, and share some additional learnings from the past year.
What is a CI/CD engineer?
Before we begin, I’ll briefly share the major points I made in What is a CI/CD Engineer? That post focused on two important elements that are at the core of the role: characteristics and duties.
Characteristics of a CI/CD Engineer
- Strong communication skills
- Keen analytical skills
- Ability to decompose complex processes into understandable components
- Proficiency in automating and optimizing processes
- Competent in team building and team communication strategies
Duties of a CI/CD Engineer
- Develop CI/CD principles
- Review and modify CI/CD principles, iteratively
- Maintain CI/CD tools/platforms (if applicable)
- Develop and maintain pipeline configurations
- Automate processes
These elements still hold true, and after much thought and many conversation, new details and patterns have emerged that validate and support some of my ideas. In this post I’ll share these revelations.
Emerging CI/CD Engineer patterns
DevOps and continuous delivery related concepts and patterns are gaining popularity in the industry. These concepts center around security, pipeline optimizations, and overall CI/CD health. Here is a list of the new CI/CD Engineer related concepts I’ll be discussing:
- Pipeline optimizations
- Performance benchmarks
Security/DevSecOps for CI/CD Engineers
In my initial post, I suggested that security related duties be an element of the CI/CD Engineer’s role. With the increased adoption of DevSecOps and developer friendly tools, security responsibilities and processes are begging to “shift left”. This essentially means that developers are able to confidently incorporate security related activities into the initial stages of their CI/CD processes, hence shifting them left (toward the beginning of these processes) rather than to the right (towards the end).
A CI/CD Engineer can establish strong communication channels between security teams and those responsible for compliance/regulatory requirements. These communication channels generate valuable collaborations between developers and security teams that enable everyone to be well informed of critical requirements. These collaborations also facilitate adoption of DevSecOps principals making developers knowledgeable and invested in otherwise unknown security practices.
I’ve identified some security practices below that should occur in the “left” or initial segments of a CI/CD pipeline. These could very well be defined by CI/CD Engineers. These practices streamline required security tasks and provide valuable feedback loops, preventing wasted time and resources that occur when you continue to process downrange pipeline segments that will ultimately end in failure. Better to catch and fix issues earlier in the pipeline than later.
Below are some security practices that can be defined and maintained by a CI/CD Engineer:
- Vulnerability scans: Probe applications for security weaknesses that could expose them to attacks
- Container image scans: Analyze the contents and the build process of a container image in order to detect security issues, vulnerabilities, or deficient practices
- Regulatory/compliance scans: Assess adherence to specific compliance requirements
There are many other industry standard security practices that are usually implemented in release processes. The examples I mentioned above are just some of the tasks that a CI/CD Engineer can implement and manage in collaboration with security and DevOps teams to ensure all security and compliance requirements are automated and consistently being applied within CI/CD pipeline segments.
Pipeline optimizations for CI/CD Engineers
In my experience, CI/CD pipelines are treated in a set it and forget it manner. For whatever reason, teams invest lots time and effort in automating their software development and release processes only to abandon work on them after implementation. This is mostly driven by fear of disturbing or breaking something that is critical to releasing software.
Though this is a potential reality, CI/CD pipelines are meant to represent software development and release processes. Therefore, they must be continually monitored, assessed, and adjusted to ensure they are not only accurately automating your software development and release processes, but doing so efficiently. These pipelines must be regularly revisited and tweaked and that can be an overwhelming task for individuals that aren’t aptly knowledgeable in all of the pipeline segments.
This is where the expertise of a CI/CD Engineer can add immense value. I view them as the CI/CD pipeline czar. They can ensure the pipelines are not neglected and are functioning efficiently. Below are some of my ideas for having CI/CD Engineers add value by properly maintaining and optimizing pipelines in collaboration with appropriate teams.
Many of the pain points I’ve experienced and often discuss with others revolve around managing the pipeline configurations in CI/CD tooling. These configuration define pipelines and serve as the execution code for the automation on CI/CD platforms. They are often expressed in YAML, domain specific languages (DSL), or some other similar variants.
The syntax in these configuration are generally limited in capabilities. Especially in regard to reusability. This mainly stems from the fact that syntax, like YAML, is a declarative data structure and not a programming language.
Restrictive code reuse, due to configuration syntax can be overcome but will require extra efforts such as building execution scripts that can represent pipeline segments while simultaneously encapsulating functionality. These configuration reusability issues are common is most CI/CD platforms and generally have the same impact: they are very difficult to maintain.
CircleCI had identified and resolved this config reusability by introducing configuration parameters and orbs, a very valuable mechanism that allows users easily package, maintain, and implement reusable pipeline configurations.
Regardless of which config reuse tactic teams adopt, it’s very clear that these config reuse efforts can become very difficult for teams if there isn’t dedicated maintainers. That’s where I see a great opportunity for the CI/CD Engineer role to drive these efforts. The CI/CD Engineer can have a solid understanding of common patterns and functionality, which can be captured and encapsulated into useful dynamic execution code.
Properly sized compute/resource nodes
An aspect of maintaining blazing fast pipelines and valuable feedback loops is ensuring that CI/CD builds are executing on adequately resourced compute nodes. It’s very common that pipeline builds are executed on severely underpowered build resources, which directly contributes to slower build speeds and longer feedback loops. Determining the specifications for an adequately sized compute node is not a trivial task, mainly due to striking a balance between hardware requirements such as CPU, RAM, network, disk IO capabilities, and maintaining acceptable build times. Hardware requirements differ widely between technology stacks and services, and these details are often neglected by teams. In my experience, this can occur because teams don’t fully understand their tech stacks, and/or they assume increased compute node capacities are more expensive than they actually are.
Beefier compute nodes do tend to cost more in general but the cost is often not as expensive as most teams fear. I’ve had experience with decreasing certain build jobs by more than half by moving to an adequately powered compute node. This specific build job was taking five minutes to complete on a resource class node using two CPU cores and 4GB RAM. I upgraded the resource class to 4 CPU cores + 16GB RAM, which completed the build in 2.1 minutes and only cost a few cents more.
In this scenario, the cost of the resource class did increase a very small amount but the overall decrease in time for that build job also decreased tremendously which created greater savings when factoring other expenses like developers waiting for builds to complete. By decreasing the build times developers are getting feedback faster and can move onto other tasks in their sprints.
CI/CD Engineers can assist teams in ensuring that builds are being executed on adequately resourced compute nodes. As with many duties shared by developers and operators, some of these seemingly irrelevant details are not monitored or addressed until the impact of under sized nodes is glaringly obvious. Having a role monitoring and adjusting these compute node issues can save teams lots of time and money, while also ensuring that build times are optimal and stay that way.
I’ve experienced teams who either don’t fully understand all of their tech stack’s capabilities, or don’t take full advantage of these capabilities. For example, I’ve interacted with individuals with comprehensive test suites that took over 45 minutes to complete within their pipeline builds. They were convinced that this was the only way to execute their tests. I was able to assist them in utilizing some of the multi-threaded processing capabilities included in their tech stack.
Most tech stacks have the capability to execute code in parallel, which means executing multiple elements and functions at the same time using the available unused CPU cores of the compute node. Parallelism, also known as concurrency, is dependant on the tech stack. It is either offered natively, or it can be implemented by leveraging existing multi-threading libraries or features. These multi-threading capabilities speed things up dramatically. They can be engaged at the stack level, versus the CI/CD pipeline level, where code is executed as defined in the CI/CD configuration syntax. By executing code concurrently, execution times are optimized from teh beginning, and when applied to a build job in a pipeline, those build jobs become substantially faster without having to tweak CI/CD configuration parameters in the build directives.
In this case, a CI/CD Engineer could assist teams in enabling multi-threading in the core tech stack, and leveraging the often under used CPU cores when code is executing. This role can identify the build jobs that are not efficient, and can collaborate on implementing effective execution strategies. These strategies can leverage multi-threading at the tech stack level, spawning concurrent process instances that utilize all the CPU resources available to it.
These optimizations can also be implemented and executed within CI/CD platform builds. CircleCI also has a concept of parallelism which is not related to the multi-threaded concurrency I discussed earlier. CircleCI’s version of parallelism enables the execution of multiple build jobs to occur at the same time on individual executors. In any case, having a CI/CD Engineer overseeing these potential optimization opportunities is yet another justification for the role among DevOps teams.
Performance benchmarks for CI/CD Engineers
CI/CD Engineers help maintain and improve pipeline consistency and velocity without damaging quality. CircleCI has extensive data regarding CI/CD builds executed on the platform, and this data enables CircleCI to accurately identify, capture, and generate valuable performance benchmarks. These performance benchmarks are at the core of some interesting delivery metrics that can be used by teams as goals. The 2020 State of Software Delivery 2020: Data-Backed Benchmarks for Engineering Teams report centers around how software development teams can measure their performance based on the following data points or benchmarks:
- Throughput: The number of workflow runs matters less than being at a deploy-ready state most or all of the time
- Duration: Teams want to aim for workflow durations in the range of five to ten minutes
- Recovery Time: Teams should aim to recover from any failed runs by fixing or reverting in under an hour
- Success Rate: Success rates above 90% should be your standard for the default branch of an application
These four benchmarks should be considered baseline metrics to be monitored, captured, and improved upon. Every organization and team has varying business-specific goals which impact development productivity. Controlling that impact hinges on expanding and improving the underlying processes.
The role of CI/CD Engineer can absolutely help teams conduct valuable monitoring and analysis on the current health of delivery and CI/CD performance. With their deep understanding of pipeline execution and technical expertise, they can collaborate on developing performance goals and metrics that will aid in achieving and maintaining the sought after results.
All too often software development teams are interested in increasing development velocity but are too disconnected from the actual build activities that factor into and control these outcomes. A CI/CD Engineer is closely coupled to the delivery process and can effectively monitor and address deficiencies as they occur, as well as expand and improve on new or existing benchmarks. They can provide near-realtime surveillance to enable teams to successfully hum along at their desired pace.
I’ve laid out an updated take on the CI/CD Engineer role, and I’ve added some fresh perspectives. In this update, I shared some of the new ideas and concepts I believe could be valuable elements to this role. These ideas were focused on security/DevSecOps, pipeline optimizations, and performance benchmarks, all of which are very impactful concepts to critical software delivery practices.
I would love to know your thoughts and opinions on the role so please join the discussion by tweeting to me @punkdata. Hopefully, we can develop this idea into existence together.
Thanks for reading!