Tips for optimizing Docker builds

Docker images are the blueprints for containers, providing the instructions for how a container is spawned. They are used as the primary image in the Docker executor. This post will highlight some expert tips that can help you optimize Docker image development and the build process.

How do you build a Docker image?

The Docker build process is triggered by using the Docker CLI tool to run the docker build command. This command builds a Docker image based on the instructions specified in a Dockerfile. A Dockerfile is a text document that contains all the ordered commands a user would call on the command line to assemble an image.

A Docker image consists of read-only layers, each of which represents a Dockerfile instruction. The layers are stacked, and each one is a delta of the changes from the previous layer. Think of these layers as a form of cache; updates are made only to the layers that change instead of updating every layer on every change.

This example below shows the contents of a Dockerfile:

FROM ubuntu:18.04
COPY . /app
RUN make /app
CMD python /app/app.py

Each instruction in this file represents a separate layer in a Docker image:

FROM creates a layer from the ubuntu:18.04 Docker image
COPY adds files from your Docker client’s current directory
RUN builds your application with make
CMD specifies what command to run within the container

These four commands create layers in Docker images when they are executed during the build process.

If you’re interested in learning more about images and layers, read about them here.

Optimizing the image build process

Now that you’re familiar with the Docker build process, here are some optimization tips to help you build images more efficiently. We’ll explore each of these in more detail below.

Use ephemeral containers
Don’t install unnecessary packages
Implement .dockerignore files
Sort multi-line arguments
Decouple applications
Minimize the number of layers
Leverage the build cache

Use ephemeral containers

The image defined by your Dockerfile should generate containers that are ephemeral. In this context, ephemeral containers mean containers that can be stopped and destroyed, then rebuilt and replaced with a freshly spawned container using minimal setup and configuration. Ephemeral containers can be considered disposable. Every instance is new and unrelated to the previous container instances. When developing Docker images, you should leverage as many ephemeral patterns as possible.

Don’t install unnecessary packages

Avoid installing unnecessary files and packages. Docker images should remain as lean as possible. This helps with portability, shorter build times, reduced complexity, and smaller file sizes. For example, installing a text editor onto a container is not required in most cases. Don’t install any application or service that is not essential.

Implement .dockerignore files

The .dockerignore file excludes the files and directories that match patterns that you declared inside of it. This helps to avoid unnecessarily sending large or sensitive files and directories to the daemon, and potentially adding them to public images.

To exclude files not relevant to the build without restructuring your source repository, use a .dockerignore file. This file supports exclusion patterns similar to .gitignore files.

Sort multi-line arguments

Whenever possible, ease later changes by sorting multi-line arguments alphanumerically. This helps to avoid duplication of packages and it makes the list much easier to update. This also makes pull requests a lot easier to read and review. Adding a space before a backslash ` \ ` helps as well.

Here’s an example from the Docker’s buildpack-deps image on Docker Hub:

RUN apt-get update && apt-get install -y \
  bzr \
  cvs \
  git \
  mercurial \ 
  subversion \
  && rm -rf /var/lib/apt/lists/*

Decouple applications

Applications that are dependant on other applications are considered “coupled.” In some scenarios, they are hosted on the same host or compute node. This is common in non-container deployments, but for microservices, each application should exist in its own individual container. Decoupling applications into multiple containers makes it easier to scale horizontally and to reuse containers. For example, a decoupled web application stack might consist of three separate containers, each with its own unique image: one to manage the web application, one to manage the database, and one for an in-memory cache.

Limiting each container to one process is a good rule of thumb. Use your best judgment to keep containers as clean and modular as possible. Then, if containers depend on each other, you can use Docker container networks to ensure that these containers can communicate.

Minimize the number of layers

Only the RUN, COPY, and ADD instructions create layers. Other instructions create temporary intermediate images, and ultimately do not increase the size of the build. Where possible, copy only the artifacts you need into the final image. This allows you to include extra tools and/or to debug information in your intermediate build stages, without increasing the size of the final image.

Leverage build cache

In building an image, Docker steps through the instructions in your Dockerfile, executing each in order. At each instruction, Docker searches for an existing image in its cache to use instead of creating a new duplicate image. This is the basic rule that Docker follows:

Starting with a parent image that is already in the cache, the next instruction is compared against all child images derived from that base image to see if one of them was built using the exact same instruction. If not, the cache is invalidated.

In most cases, simply comparing the instructions in the Dockerfile with one of the child images is sufficient. However, certain instructions require more examination and explanation.

For the ADD and COPY instructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file. The last-modified and last-accessed times of the files are not considered in these checksums. During the cache lookup, the checksum is compared against the checksum in the existing images. If anything has changed in the files, such as the contents and metadata, then the cache is invalidated.

Aside from the ADD and COPY commands, cache-checking does not look at the files in the container to determine a cache match. For example, when processing a RUN apt-get -y update command, the files updated in the container are not examined to determine if a cache hit exists. In that case, the command string is used to find a match.

Once the cache is invalidated, all subsequent Dockerfile commands generate new images and the cache is not used. Leveraging your cache involves layering your images so that only the bottom layers change often. You want your RUN steps that change more frequently towards the bottom of the Dockerfile, while steps that change less often should be ordered towards the top.

Optimize Docker image builds in CI pipelines

Our focus so far has been optimizing Docker image builds from a code and Docker CLI build perspective. The next logical step is to implement these optimization tactics into CI pipelines. CircleCI has a specific Docker build optimization that will dramatically speed up your automated Docker build jobs.

All of the optimization concepts mentioned in previous sections hold true for implementing into your CI pipeline, especially caching. If there is a change to the Dockerfile, leveraging your cache is still the most optimal way to reduce the build time.

How does this work as a part of your CI pipeline? When using the Docker executor as the runtime for build jobs, you can leverage a feature called Docker layer caching (DLC) to speed up those builds.

DLC is a great feature to use when building Docker images is a regular part of your CI process. DLC will save image layers created within your jobs. DLC caches the individual layers of any Docker images built during your jobs, and then reuses unchanged image layers on subsequent CircleCI runs, rather than rebuilding the entire image every time.

The less your Dockerfiles change from commit to commit, the faster your image-building steps will run. You can then use DLC with the machine executor and the remote Docker environment (setup_remote_docker). Remember that DLC is useful only when creating your own Docker image with docker build, docker compose, or similar Docker commands. It does not decrease the wall clock time that all builds take to spin up the initial environment. If you’re interested in learning more about DLC, you can read about it in our documentation.

Note: For more Docker content, see Guide to using Docker for your CI/CD pipelines.

Summary

This post covered optimization techniques for building Docker images. The build recommendations provided will serve as a guide for you in developing Docker images efficiently. CI pipelines are sped up tremendously by these build recommendations.

Most folks don’t need to build their own custom images. At CircleCI, we have built a fleet of CI-optimized Docker images for you to use in your CI pipelines.

Thank you for following along with this post.

Site

Blog