In the last 5 years, at Resultados Digitais, we have grown our product area to 60 people split into 7 teams. We now do over 12 deploys per day using Slack and commit every 5 minutes. But it was not like that some years ago.
My old process
In 2014 we had 300 customers and 7 developers in a single team. We used a kind of Gitflow process:
Branch of master
Development and tests
Code Review (PR)
Tests in staging environment
Merge to master
At that time, I used to manually perform steps 4, 5, 6 and 7, which obviously wouldn’t scale.
The problems with this approach
Merging and deploying someone else’s PR is not easy. I remember spending long nights trying to fix errors in production, trying to contact developers that had worked on the feature, and fixing database migrations. It was definitely affecting other aspects of my life since I never had enough time and was doing a poor job.
When you centralize the process on you it might indicate lack of trust. It can be difficult to do it as a founder in the beginning of the company, but your job is to create something that won’t need you. Something that will scale.
This process was increasing the time to deliver value to our clients, which doesn’t make sense for any company. It was taking too long to rollout simple enhances and features. Finally, it stimulates bad culture, where nobody takes responsibility for what they’re delivering and there is always another person fixing problems. In addition to that, merging the branch into master before deploying caused problems, because when something breaks, it is difficult to roll back (can get worse when you grow).
So, we definitely needed to fix it to scale. The first thing we did was Empowering People. We had to guarantee that people would make better choices and understand what was going on. But when you just Empower People, they will make mistakes. So you need to fix 3 things.
#1 Enhance your processes
I won’t discuss if you need to do TDD or not, but we do TDD. The point here is that I understand automated tests are a guarantee that allows you to evolve as a company. Our business rules will change, so will our models, but we have tests to keep evolving consistently. You should not only have great coverage (we have 95% coverage), but also have great tests to make sure you test all the scenarios.
If you have automated tests, you need a way to run them and keep the teams informed about the progress. We have chosen CircleCI because our business is not running and configuring a CI hosted server. We have tried that in the past thinking that would save money in some way, but we found hidden costs like maintenance, updates, security and focus on your business. Always find the best tool in the cloud, create a business that the costs can scale and keep focused on what you do right.
We also use a SaaS product to check our code quality. As any static analysis tool, it might not give you the complete picture, but the point here is that it is a great guide when you are scaling teams.
Anyone that did not work in a PR is available to review it. Revisions are not just downloading the code and running it. The reviewer should also spend some time to see if there are enough tests and the different scenarios. If tests are not green, coverage or the static analysis is below the standard, the PR is not even qualified for revision.
So you have set enough rules to keep scaling and building great software. Now it is time to automate your tasks. If humans keep doing the deploy or checking things manually, at some point, they probably will commit mistakes.
At Resultados Digitais, we have worked on a bot forked from Hubot. We also use hubot-deploy for deploying over Slack.
Whoever opens the PR is responsible to putting it in production, because this specific person and their team are the most interested in that. We use chat ops and deploy via Slack. Our bot also controls the default rules we have set, for example, it won’t deploy if CI is not green (unless we really force). After we deploy, we keep checking data (response time and errors) on other app providers to see if is everything ok and rollback if needed.
You also need to work on a better culture. A strong culture will keep solving future problems. It will scale and evolve processes.
In order to improve the development process, we started enhancing our devops: educating teams, developing people for that role, using cloud tools for that. We also are evolving our QA process. It is not just adding more testers, although we do it on review phases as well, but enhancing quality to avoid bugs and incidents. So we started working more in the beginning of the conception of a feature, understanding its impacts and risks.
All this knowledge should be shared between the teams. We do presentations, talks, groups of interests to keep evolving.
How we do it now
This is our process now:
- Branch of master
- Development and tests
- Pull Request
- Code Review
- Test in staging environment
- Deploy and rollout in production for company accounts
- Merge to master
- Rollout production for high trusted customers
- Rollout in production for alpha customers
- Rollout for everybody
We don’t force developers and team to follow all this flow, however we have guidelines of what to do in each case in order to avoid problems. For example, if you are refactoring a feature X and that is a critical area, you should follow the whole flow. Basically if we need to increase friction to guarantee quality and lower risk we can do it. The developer is free to skip any of the phases: internal version for company accounts, high trusted customers, alphas and deploy to the rest of the base.
The main message here is that if you are doing something that doesn’t scale, think about the next step and start changing today. We have to keep evolving.
Guest author Bruno Ghisi is the CTO of Resultados Digitais, the Brazilian leader in Marketing Automation with a product called RD Station that has over than 3,500 customers. You can follow him at @brunogh and follow Resultados Digitais’ dev/product blog at Ship It (pt_BR).