Continuous integration (CI) tools have been evolving towards flexible, general-purpose computing environments. They aren’t just used for running tests and reporting results, but often run full builds and send artifacts to external systems. If you’re already relying on a CI system for these other needs, it can be convenient to build and deploy your documentation using the same platform rather than pulling in an additional tool or service.
This post gives an overview of some popular options currently available for building and deploying documentation before diving into the details of using CircleCI to deploy documentation to GitHub Pages, a workflow that will be convenient for teams already using those tools for hosting code and running automated tests.
Options for deploying documentation
API documentation is generally rendered from a codebase using a language-specific documentation tool (
sphinx for Python,
javadoc for Java, etc.). The building of the documentation can be done on the developer’s local machine, in a CI environment, or in a documentation-specific hosting service.
Services for hosting documentation are generally language-specific and can be a great low-friction option for a team that tends to write projects in a single language. For example, Read the Docs has been a standard tool in the Python community for many years. Read the Docs uses webhooks to watch commits to a hosted repository and will automatically build and render documentation for each code update and offers some nice conveniences that could be difficult to replicate in your own pipeline, such as deploying multiple versions of documentation and maintaining links from rendered docs to source code. Its limitations come into play if teams need to deploy docs for additional languages or if builds require uncommon system dependencies that can’t be installed via the
conda package managers. Using a documentation-specific service also means maintaining another set of user accounts and permissions for that additional service.
Conversely, the least infrastructure-dependent workflow for building documentation is for developers to build docs locally and check the results into the project repository. Most teams prefer to keep generated content out of source control to keep code reviews simpler and to lessen developer responsibility for building and committing the content, but some may enjoy seeing the revision history of documentation alongside the code. GitHub has developed support for this workflow by offering the option to render contents of a
docs directory to GitHub Pages. Other setups may still need a separate deploy step for documentation in a CI system.
If instead, a team decides to build documentation as part of a CI flow, content could be deployed to a wide variety of destinations such as a locally maintained server, an object store like Amazon S3, GitHub Pages, or some other external hosting service. In most cases, the CI job will need some form of credentials in order to authenticate with the destination, which can be the most complex part of the flow. One of the main advantages of GitHub Pages as a documentation host is the consolidation of permissions; any developer with admin access on a repository can set up deploys to GitHub Pages and provision the deploy keys needed for a CI service to commit content.
Options for deploying to GitHub Pages
GitHub offers three options for deploying a site to GitHub Pages, with different implications for workflows and credentials.
The oldest option, and the one we’ll use in our walkthrough, is for pushes to a special
gh-pages branch to trigger deploys. This is generally maintained as an “orphan” branch with a completely separate revision history from
master, which can be a bit difficult to maintain. In our case, we’ll build a CircleCI workflow that builds documentation, commits changes to the
gh-pages branch using a library, and then pushes the branch to GitHub using a deploy key that we’ll provision.
The second option is to have GitHub Pages render the
master branch. This can be useful for a repository that exists only to host documentation, but doesn’t help much if your goal is to benefit from keeping code and rendered documentation close together with a single permissions model.
Finally, GitHub Pages can render a
docs directory on the
master branch, which supports workflows where developers are expected to generate and commit documentation as part of their local workflows. This requires no CI platform and no additional credentials, but most teams prefer not to include generated content in their
master branch as discussed in the previous section.
Creating a basic Python project
Let’s build a small Python package that uses standard Python ecosystem tools for tests (
pytest) and documentation (
sphinx). We’ll configure CircleCI to run tests, build documentation, and finally deploy to GitHub Pages via a
gh-pages branch. Full code for the project is available in jklukas/docs-on-gh-pages.
In a fresh directory, we’ll create a simple package called
mylib with a single
mylib/__init__.py looks like:
def hello(): return 'Hello'
We also need to create a
test directory with an empty
__init__.py file and
import mylib def test_hello(): assert mylib.hello() == 'Hello'
To actually run the tests, we’ll need to have
pytest installed, so let’s specify that in a
requirements.txt file. We’ll also request
sphinx, the documentation tool we’ll be using in the next section:
At this point, we can write a very simple CircleCI workflow containing a single job that will run our test. We create a
.circleci/config that looks like:
version: 2 jobs: test: docker: - image: python:3.7 steps: - checkout - run: name: Install dependencies command: pip install -r requirements.txt - run: name: Test command: pytest workflows: version: 2 build: jobs: - test
We commit all these results, push them to a new GitHub repository, and enable that repository in CircleCI. CircleCI should generate an initial build for the
master branch which should come back green.
Now that we have a basic library with tests, let’s set up the documentation framework. At this point, you’ll need to have
sphinx installed locally, so you may want to create a virtual environment using the
venv tool and then call
pip install -r requirements.txt, which makes the
sphinx-quickstart command-line tool available for generating a documentation skeleton. We’ll invoke it like this:
sphinx-quickstart docs/ --project 'mylib' --author 'J. Doe' # accept defaults at all the interactive prompts
sphinx-quickstart generated a
Makefile for us, so building docs is as simple as calling
make html from the
docs/ directory. Let’s codify that in a new job in our CircleCI flow. We can add the following underneath
docs-build: docker: - image: python:3.7 steps: - checkout - run: name: Install dependencies command: pip install -r requirements.txt - run: name: Build docs command: cd docs/ && make html - persist_to_workspace: root: docs/_build paths: html
make html populates a
docs/_build/html directory containing the content that we want to deploy. The final
persist_to_workspace step of our new
docs-build job saves the contents of that directory to an intermediate location that will be accessible to later jobs in our workflow. For now, we’ll add this new job to our workflow:
workflows: version: 2 build: jobs: - test - docs-build
and commit the results.
Even without deploying the rendered content, this job is now serving as a check on the integrity of our docs. If
sphinx is unable to run successfully, this job will fail, letting you know something is wrong.
Deploying rendered docs to a gh-pages branch
We’re ready at this point to start building the final piece of our CI workflow, a job that will deploy the built documentation by pushing it to the
gh-pages branch of our repository.
gh-pages to be an “orphan” branch that tracks only the rendered docs and has a separate timeline from the source code in
master. It’s possible to create such a branch and copy content into it using bare
git command-line invocations, but it can be full of edge cases and easily lead to a corrupted work environment if anything goes wrong. Pulling in a purpose-built tool is a reasonable choice in this case and there several available as open source projects. The most popular among these at the moment is actually a Node.js module called
gh-pages that includes a command-line interface, which is what we’ll use here.
Let’s go ahead and write a first version of a
docs-deploy job underneath the
jobs section of our
config.yml file and walk through the steps:
docs-deploy: docker: - image: node:8.10.0 steps: - checkout - attach_workspace: at: docs/_build - run: name: Install and configure dependencies command: | npm install -g --silent email@example.com git config user.email "firstname.lastname@example.org" git config user.name "ci-build" - run: name: Deploy docs to gh-pages branch command: gh-pages --dist docs/_build/html
We use a
node base image so that the
npm package manager and Node.js runtime are available. The
attach_workspace step mounts the rendered documentation from the
docs-build step into our container, then we call
npm install to download the target module, which includes a command-line utility,
gh-pages, that we’ll invoke in the next step. The
git config commands are required per the module documentation. Finally, the invocation of
gh-pages --dist docs/_build/html copies the contents of the
html directory into the root of the
gh-pages branch and pushes the results to GitHub.
Let’s add this new step to our workflow. The
workflows section now looks like:
workflows: version: 2 build: jobs: - test - docs-build - docs-deploy: requires: - test - docs-build filters: branches: only: master
We made the
docs-deploy job dependent on the other two steps, meaning that it won’t run until both those steps complete successfully. This ensures we don’t accidentally publish docs for a state of the repository that doesn’t pass tests. We also set a filter to specify that the
docs-deploy job should be skipped except for builds of the
master branch. That way, we don’t overwrite the published docs for changes that are still in flight on other branches.
If we check in all these changes and let CircleCI run our job, our new job will fail:
ERROR: The key you are authenticating with has been marked as read only.
So there’s a bit more work we need to do to clean this up and make sure our CI job has the necessary credentials.
Provisioning a deploy key
As discussed in https://circleci.com/docs/gh-bb-integration/, GitHub provides a few options for giving a job access to change a repository. Generally, GitHub permissions are tied to users, so a credential must either be tied to a single human user account or a special machine user account must be provisioned. There’s a lot of flexibility there for granting access across repositories, but it can become somewhat complex.
We opt instead to provision a read/write deploy key. This is an ssh key pair specific to a single repository rather than a user. This is nice for teams, because it means access doesn’t disappear if the user who provisions the key leaves the organization or deletes their account. It also means that any user who is an administrator on the account can follow the steps below to get the integration set up.
Let’s follow the instructions in the CircleCI docs and apply them to our case.
We start by creating an ssh key pair on our local machine:
ssh-keygen -t rsa -b 4096 -C "email@example.com" # Accept the default of no password for the key (This is a special case!) # Choose a destination such as 'docs_deploy_key_rsa'
We end up with a private key
docs_deploy_key_rsa and a public key
docs_deploy_key_rsa.pub. We hand over the private key to CircleCI by navigating to https://circleci.com/gh/jklukas/docs-on-gh-pages/edit#ssh, hitting “Add SSH Key”, entering “github.com” as the hostname, and pasting in the contents of the private key file. At this point, we can go ahead and delete the private key from our system, as only our CircleCI project should have access:
The https://circleci.com/gh/jklukas/docs-on-gh-pages/edit#ssh page will show us the fingerprint for our key, which is a unique identifier that’s safe to expose publicly (unlike the private key itself, which is sufficient to give an attacker write access to your repository). We add a step in our
docs-deploy job to grant the job access to the key with this fingerprint:
- add_ssh_keys: fingerprints: - "59:ad:fd:64:71:eb:81:01:6a:d7:1a:c9:0c:19:39:af"
While we’re on the subject of security, we’ll head to https://circleci.com/gh/jklukas/docs-on-gh-pages/edit#advanced-settings and double check that “Pass secrets to builds from forked pull requests” is set to its default of “Off”. SSH keys are one of the types of secrets that we only want to make available if we trust the code being run; if we allowed this key to be available to forks, an attacker could craft a pull request that prints the contents of our private key to the CircleCI logs.
Now, we need to upload the public key to GitHub so that it knows to trust a connection from CircleCI initiated with our private key. We head to https://github.com/jklukas/docs-on-gh-pages/settings/keys > Add Deploy Key, make the title of it “CircleCI write key” and paste in the contents of
docs_deploy_key_rsa.pub. If you haven’t already deleted the private key, be extra careful you’re not accidentally copying from
Some final fixups
Before we test that our CircleCI workflow can successfully push changes to GitHub, let’s address a few final details.
First, our built documentation contains directories starting with
_, which have special meaning to
jekyll, the static site engine built into GitHub Pages. We don’t want jekyll to alter our content, so we need to add a
.nojekyll file and pass the
--dotfiles flag to
gh-pages since that utility will otherwise ignore all dotfiles.
Second, we need to provide a custom commit message that includes
[skip ci] which instructs CircleCI that it shouldn’t initiate anew when we push this content to the
gh-pages branch. The
gh-pages branch contains only rendered HTML content, not the source code and
config.yml, so the build will have nothing to do and will simply show up as failing in CircleCI. Our full job now looks like:
docs-deploy: docker: - image: node:8.10.0 steps: - checkout - attach_workspace: at: docs/_build - run: name: Disable jekyll builds command: touch docs/_build/html/.nojekyll - run: name: Install and configure dependencies command: | npm install -g --silent firstname.lastname@example.org git config user.email "email@example.com" git config user.name "ci-build" - add_ssh_keys: fingerprints: - "59:ad:fd:64:71:eb:81:01:6a:d7:1a:c9:0c:19:39:af" - run: name: Deploy docs to gh-pages branch command: gh-pages --dotfiles --message "[skip ci] Updates" --dist docs/_build/html
We’re ready to commit our updated configuration and let CircleCI run the workflow. Once it shows green, we should notice that our repository now has a
gh-pages branch and that the rendered content is now available at https://jklukas.github.io/docs-on-gh-pages/.
There is no one obvious “best way” to build and deploy documentation. The path of least resistance for your team is going to depend on the particular mix of workflows, tools, and infrastructure that you are already familiar with. Your organizational structure is important as well, as it will have implications for who needs to be involved to provision credentials and get systems talking to one another.
The particular solution presented here is currently a good fit for the data platform team at Mozilla (see an example in practice at mozilla/python_moztelemetry) because it is adaptable to different languages (our team also maintains projects in Java and Scala), it minimizes the number of tools to be familiar with (we are already invested in GitHub and CircleCI), the permissions model gives our team autonomy in setting up and controlling the documentation workflow, and we haven’t seen a need for any of the more advanced features available from documentation-specific hosting providers.
Jeff Klukas has a background in experimental particle physics, working both as a teacher and as a researcher helping discover the Higgs boson. He now works remotely from Columbus, Ohio on the Firefox data platform at Mozilla and was previously the technical lead for the data platform at Simple, a branchless bank in the cloud.