[Photo credit to @manp]
There’s a lot of challenges in testing your code against 3rd party APIs, so I’m going to explain some best practices that we’ve come up with or seen customers use over time at CircleCI.
There are two kinds of 3rd party APIs: ones that you don’t control, and ones that you do control. Service you don’t control include products like Twilio, Mailgun, GitHub, Intercom, or any other product or company that your team relies on in production. Services you control means something built by another team in your company, like the storage backend, or notification service to your application.
3rd party APIs you don’t control
Very often, your application or service will use a 3rd party API as part of its role. So for example, you might notify customers via SMS and email, using Twilio and Mailgun. Or you might let customers log in via Facebook or Twitter, or GitHub, like we do at CircleCI.
You’ll want to test against these services for two reasons. Firstly, you’ll want to validate that your new code has the expected outputs. Obviously, this is just normal testing: make sure it works.
The second reason is to validate that 3rd party services don’t change in a way that breaks you. Because 3rd party services are out of your control, and maintained by a team that you don’t have access to, they can change their APIs in a way that breaks you, without telling you, possibly intentionally but often not. If they do, you need to know ASAP.
There are two ways to deal with this. First, you can test it live! That is, during your tests, make the actual API calls you would in production. You’ll need a test account, of course: in some cases, you can make an account via that company’s API during the test and delete it afterwards. Alternatively you might have a manually set up test account, which we do when we test against the Intercom API. Or finally, some companies, notably Stripe, automatically set up a test API key to use.
The primary reason that live testing sucks is that these tests will be flaky. You’ll get 503s when the 3rd party service has imperfect deployments (which is often, in our experience) or has a minor misconfiguration. You’ll get 500s when they have a bug in their app. And you’ll randomly get 401 errors if you have too many tests running in parallel and you hit their rate-limits.
The standard solution is to mock out these interfaces. Mocks are a very common and well-known best practice. In a mock, you record the result of the API call, validate manually that it says what you expect it to, and then during your build you run the test against the saved mock. Thus is very common, and I’m sure most people who have written tests have done this in some form at some point.
The problem with mocks is that you lose the second property: validating that 3rd party services don’t change. You can go on happily running your tests, believing that your Twilio integration works, when actually a subtle, undocumented behaviour that you rely on has changed, and you’re none the wiser.
So the correct way to deal with this is a hybrid approach. First you create the mocks and run your tests against them. This gives you protection from the flakiness of 3rd party services as you’re trying to ship code.
Secondly, you take your mocks and test them continuously.
When I say test them continuously, what I mean is that you need to make them part of your infrastructure monitoring
Think about it: those services form part of your application. Monitoring key parts of your infrastructure is vital to make sure things keep working. If parts of your DB were to fail, you’d want to have monitoring so that you could know. Similarly with 3rd party services, they are part of your infrastructure so monitor them.
The same way you do with any infrastructure. You test them - continuously run the API call and validate that you get the expected result. You monitor it - stick the result of the test in Datadog or New Relic or whatever you use. You can see spikes or falls in the graph when the service inevitably fails and you can decide whether this is an operational problem or just a minor glitch. Finally, you alert: if Github’s API fails, I want to get out of bed because my service might be down too.
Services you do control
This approach is useful for services you don’t control, but it may not be completely necessary within your own company.
Your app or service likely relies on other services or microservices that you may not be directly working on but that fall under the general umbrella of code that you have access to - probably another team or person in your team is working on them, but it’s not code you touch yourself.
In some ways you are in the same situation as before: you don’t directly control these APIs, so they may change and break unexpectedly. Or, the service may be flaky in the way that all software is flaky, so you’ll still want to avoid live testing. This puts you in the same place: a hybrid approach with an operational component.
However, since the other service is in your company, you’ll likely share an operations team with them, and the ops team will be able to know that the root cause of your service’s failure is the other service (for smallishg companies - in big companies like Amazon this is a major headache). This means they’re unlikely to page you, and the operational component is significantly lowered relative to 3rd party services in another company.
You can also do a little bit better here, because you do have access to that codebase in some way. You can set up integration tests such that when another version of the service is being built, you’re part of their tests in some small way. Either have their codebase test by pulling in your service and testing against it, or set up tests that pulls in their code for every new version. Either way, you get advanced warnings of breakages.
A modern and very practical alternative and new best practice is for each service to ship a container (or other similar construct) that you can run as part of your integration tests. If your upstream services are all containerized, you can run tests very easily against those services as part of your tests. When the number of services compound, there’s the option to reduce microservice overhead using a set of shared libraries for concerns like logging, tracing, and metrics have helped reduce the cost of building and maintaining those new services.
Mock all API calls to avoid flaky tests.
If you don’t control the API you’re testing against, continuously test expected API results against your mocks. Connect this to your operational monitoring and metrics.
If you do control the API you’re testing against, make sure to have integration tests: your use cases in their tests, and their service, if possible, in your tests.