When Dollar Shave Club began using CircleCI, they found it improved not just their CI/CD pipeline, but their production monitoring and release management operations as well, while making their developers’ lives easier. They came to us wanting to share the story of their transition to CI and how using CircleCI has helped them improve developer experience while increasing the quality of their software and their speed of delivery. Read on for their story.
Transitioning to CI/CD
by Dollar Shave Club engineering manager Jon Ong and backend engineer Yuki Falcon
In 2017, we switched from manually testing each code change to CI/CD with test automation. This change opened up some new possibilities for us that greatly improved our workflow. Here are some of the biggest avenues we’ve opened up as a result of this transition:
We can dark-launch new features using feature flags and/or A/B tests. Decoupling deployments and feature releases has introduced some great benefits. We’re able to conduct user acceptance tests in production. The feedback loop is accelerated and improved as a result of obtaining input from stakeholders earlier in the development process. Additionally, we’re able to push to production faster and more frequently which hosts its own rewards like higher engineering velocity, better code reviews and easier tracing of regression causes.
QA sign-off is no longer required for each deploy. Engineers are now primarily responsible for code quality amongst themselves. This increased our QA department’s overall bandwidth by allowing them to shift their focus from manual regression testing to holistic testing. They’re also able to provide user/feature feedback earlier in our product design cycle.
Our frontend team can now deploy straight to production. Bypassing our shared staging environment forced us to increase our reliance on CI tests and our dynamic staging environments (DQAs). Created by our very own Infrastructure & Platform Services Engineering Manager Benjamin Keroack, DQAs are replicas of the production environment that incorporate pull request changes.
We use Percy for our visual regression tests, Codecov for our code coverage analysis and Rollbar for error monitoring.
Improving our CI Tests
This increased reliance on CI tests forced us to revamp our testing strategy. Our testing landscape up to this point was bleak. Our unit tests ran on CircleCI 1.0 and the end-to-end (e2e) tests ran on Jenkins. However, unit tests on CircleCI 1.0 were not ideal due to its parallelism strategy and the flakiness of our tests. Jenkins gradually couldn’t handle the scale. In fact, many of the test suites we were using weren’t great, largely due to the way we wrote the tests. Poor testing patterns in Ruby Selenium caused inconsistent results. Ember acceptance tests were shoddy because of rotting code, tests that weren’t written in a way that the test suite intended and an unreliable test framework.
To solve those headaches, we implemented a couple of solutions. First, we rewrote our e2e test runner to support parallelism and automatic retries of our tests. The retries occur on Sauce Labs to provide us with a recording of the network requests. Second, we migrated to CircleCI 2.0 workflows, which allowed us to split the tests into several distinct jobs, simplifying retries and increasing parallelism.
The results were better than we could have hoped. CircleCI 2.0 jobs were twice as fast as those on CircleCI 1.0. Our e2e tests went from taking 30-40 minutes to about 15 minutes. Engineers noted the improved UX and complained less about failing tests. Frontend engineers wrote their own automation tests which improved test reliability, speed, and debuggability.
CircleCI 2.0 allows us to rerun only the failing tests versus re-running the entire test suite.
Improving our Monitors
We used to use New Relic for monitors. For a variety of reasons, we decided to write our own monitors that run on CircleCI 2.0 (or via a Docker container). This allowed us to save money, run tests more frequently, and best of all, run monitors as tests.
Our monitors are hooked up to DataDog, which is hooked up to PagerDuty.
Our monitors run every minute in a single CircleCI job, split-parallelized across 16 containers. CircleCI 2.0’s scheduled workflows allows us to see these running monitors.
Improving developer UX
We improved developer UX further by triggering CI/CD automatically.
Automatically updating dependencies
We use Greenkeeper to automatically make PRs when dependencies publish a new version and update our lockfiles on updates. With auto-merging, a PR is automatically merged when all required checks (linting, tests, monitors) pass. Engineers can now pin dependencies and keep them up-to-date by approving these Greenkeeper PRs and labeling them as “automerge”. We can also easily find out which dependencies break our application.
Monitoring our pull request CI/CD pipeline
No-code-change PRs are created every 15 minutes to track how well our CI/CD pipeline functions. The information goes to DataDog, alerting us when our CI/CD pipeline goes awry.
Switching to CircleCI 2.0 and having engineers write their own tests has been an advantageous decision. We now have a faster and more reliable CI/CD pipeline. Our engineers only worry about the Git workflow (create PRs, get approvals, get GitHub checks to pass, press merge) and feature implementation. Engineers now make dozens of small commits to master instead of a few per day, increasing velocity and improving the ability to pinpoint regressions. Happy coding!