Scaling my application: am I ready?

This post originally appeared on The New Stack here.

Most applications begin with a small to medium-sized user base. Even with migration projects, you would not immediately open your new application to the entire existing user base. Instead, you would first test with some internal users, then open up to early adopters.

Nevertheless, if your application is successful, at some point you will face the need to scale it.

The need to scale is a nice problem to have. It means your application is popular and needs to grow. There are many areas where an application needs to scale. For instance, it may need to scale in terms of offered features, or it may need to scale in terms of processing or storage. In this article, we will focus on the scaling in terms of daily active users, or requests per time unit.

Efficiency enhancements for scaling are among the most difficult issues to solve. While there is no silver bullet that can increase efficiency for every application, there are some strategies and techniques that can be applied in many scenarios. We will go over some points to consider when you start to make decisions about scaling your app.

What to scale and how far?

At a high level, there are two kinds of scaling we should consider when we need to enhance the ability of the application to handle more users and requests (as opposed to adding features or functionality).

First, we can scale the application’s ability to handle requests by providing more powerful hardware. The advantage of this approach is that it doesn’t require changing the application code — you just run it on a more powerful server. But at some point it becomes impossible to add more processing power, bigger attached storage, faster networking, or additional memory.

Second, you can scale by adding additional machines. This approach lets you take advantage of multiple commodity servers to do the work. However, to make the best use of network performance and work distribution, you may need to optimize your application code — and potentially re-architect the application (though doing so makes further scaling easier).

If you start with a monolithic app, then scaling the hardware may be your first choice. Here, you just need to provision faster machines with more processors and additional memory for the code to run faster.

However, this just makes a single instance of your application faster as long as you can find more powerful hardware. If you want global scaling — being able to serve the application with the lowest possible latency demands to a worldwide audience — you’ll need to take a geo-distribution approach to your worker nodes. This means adding more machines allocated in datacenters around the world, where they can provide low-latency responses.

This is where using the microservice approach becomes valuable: you can split your application into multiple dedicated services, which are then Dockerized and deployed into a Kubernetes cluster.

Automate first

While at some point scaling becomes necessary, it’s a good practice to first see how you can optimize your current application. There are several development and deployment practices that are helpful before you scale — and that also make scaling easier when you do get to that step.

First, to verify the validity of your application, you should have decent test coverage. Making sure you have valid unit test and regression test coverage also prevents problems that arise due to any changes to the codebase required later for scaling.

Ideally, all testing efforts should be fully automated and should run on each build. Continuous integration pipelines are a key part of this. Continuous integration (CI) ensures code changes are automatically tested and merged in your main branch. Continuous delivery automatically deploys changes to staging or production infrastructure — but only if it has passed continuous integration tests and checkpoints.

In the deployment phase, you can still run regression tests — for example, to verify performance in a stress test. This provides the basis for any enhancements.

You can also automate resource configuration. If you employ an Infrastructure as Code (IaC) approach, using tools like HashiCorp Terraform or AWS CloudFormation to automatically provision and configure servers, you can even test and verify the configuration code used to create your infrastructure. Migrations can also be at least semi-automated, leading to a high degree of flexibility.

Scaling data storage

One of the most difficult things to scale in any application is the database. The core issue is explained by the CAP theorem:

Consistency — every read receives the most recent write or an error.
Availability — every request receives a (non-error) response, without the guarantee that it contains the most recent write.
Partition tolerance — the system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

You can only choose two of the above three points for a database system. For instance, if you value consistency and availability, you give up partition tolerance. As a result, your database system would yield a single node instance (potentially with fallbacks/read-only mirrors).

Some existing database systems address this issue. For instance, many NoSQL systems (such as MongoDB) include partitioning. On the other hand, a classical relational database system (such as MySQL) can still be scaled when given enough resources. But it still has its limits. You can extend its abilities using table partitioning and sharding, which can be difficult to use and require restructuring of the table schemas.

The first step is to calculate how much traffic (or storage capacity) your database system needs to handle for the anticipated load. Knowing this number is crucial. Before exhausting hardware capabilities, you should consider software optimizations. Quite often database queries, as well as write operations, can be optimized. This not only saves money on the required hardware, but also lets your application handle more users.

Introducing an efficient cache layer can also be a great way to achieve more with less. When moving to more distributed architectures, such as microservices, you will end up with some caching instances regardless. Many development teams introduce caching to their system with fast, easy-to-use in-memory data stores (like Redis). While full pages or API requests may be suitable targets, some expensive database queries can be cached easily.

Scaling file storage

Similar to general data storage, you will eventually need to think about distributed file storage. Storing a file on an attached or even integrated disk is by definition a bottleneck. While using network-attached storages (NAS) or even a storage area network (SAN) may be helpful for on-premise systems, you can also leverage cloud services for this common task.

Cloud providers like AWS and Azure have dedicated services to upload and download files. These services not only provide options for geo-distribution, caching, fragmentation, checks, and more, they also allow setting policies for accessing the file (read and write).

One benefit of using a cloud provider is the potential IO usage. Transferring files — especially massive ones in the hundreds of MB or even GB range — will have a huge impact on your network capacity. Leveraging the cloud provider’s network saves bandwidth for handling the application’s requests. Note that cloud egress costs can become expensive very quickly, so try to calculate your expected bandwidth usage ahead of time to avoid any surprises.

All of this assumes that you have either a lot of files, large files, or both. If you only have a few smaller files, then using your own distributed and efficiently scaling database system for storing and retrieving them may be sufficient.

Traffic optimization

Even when data and file storage are fully distributed and easily scalable, your application might not perform well. The reason is that a single entry point forms a natural bottleneck. There are multiple techniques to mitigate this.

DNS rotation eliminates the situation where only a single IP is receiving all requests going to our domain. However, even then you have multiple entry points, but still have single instances of the application behind them.

Another technique is to use a load balancer for dividing traffic among multiple running instances. This is another of the advantages of using a cloud provider. They have services that implicitly use a load balancer while offering an explicit load balancer, too. For instance, on AWS, you can leverage Amazon Elastic Load Balancer for distributing incoming traffic.

Generically, you can use software-defined networking and the capabilities of resource orchestrators. One example is Kubernetes’ built-in load balancer. Alternatively, consider dedicated software like HAProxy for load balancing.

Continuously scaling

Scaling your application can be intimidating. Thinking in terms of the individual application components, you can design a plan to ensure proper scaling of these components — and, eventually, of the application as a whole.

Instead of trying to be prepared for anything, try to think of specific scenarios and decide how you’ll respond if that scenario happens. Be sure to include advantageous scenarios — like doubling business, adding support for expanding into a new region, or rolling out new products and features — not just problems you want to avoid.

A good CI tool is essential to help you be flexible as you scale, thus avoiding the temptation of initial over-engineering. It lets you keep resources lean, focused and targeted — and lets you scale with speed and confidence.

CI adoption also changes developer team culture. When it comes to developer team success, finding the right DevOps metrics to measure is crucial. Learn how to measure DevOps success with four key benchmarks for your engineering teams, in the 2020 State of Software Delivery: Data-Backed Benchmarks for Engineering Teams.