Backup and Recovery
This chapter describes failover or replacement of the services machine. Refer to the Backup section below for information about possible backup strategies and procedures for implementing a regular backup image or snapshot of the services machine.
Specify a spare machine, in an alternate location, with the same specs for disaster recovery of the services machine. Having a hot spare regularly imaged with the backup snapshot in a failure scenario is best practice.
At the very least, provide systems administrators of the CircleCI installation with the hostname and location (even if co-located) of an equivalent server on which to install a replacement server with the latest snapshot of the services machine configuration. To complete recovery, use the Installation procedure, replacing the image from that procedure with your backup image.
Backing up CircleCI Data
This document describes how to back up your CircleCI application so that you can recover from accidental or unexpected loss of CircleCI data attached to the Services machine:
|If you are running CircleCI in an HA configuration, you must use standard backup mechanisms for the external datastores. Contact email@example.com for more information document for more information.|
Backing up the Database
If you have not configured CircleCI for external services, the best practice for backing up your CircleCI data is to use VM snapshots of the virtual disk acting as the root volume for the Services machine. Backups may be performed without downtime as long the underlying virtual disk supports such an operation as is true with AWS EBS. There is a small risk, that varies by filesystem and distribution, that snapshots taken without a reboot may have some data corruption, but this is rare in practice.
|"Snapshots Disabled" refers to Replicated’s built-in snapshot feature that is turned off by default.|
Backing up Object Storage
Build artifacts, output, and caches are generally stored in object storage services like AWS S3. These services are considered highly redundant and are unlikely to require separate backup. An exception is if your instance is setup to store large objects locally on the Services machine, either directly on-disk or on an NFS volume. In this case, you must separately back these files up and ensure they are mounted back to the same location on restore.
Snapshotting on AWS EBS
There are a few features of AWS EBS snapshots that make the backup process quite easy:
To take a manual backup, choose the instance in the EC2 console and select Actions > Image > Create Image.
Select the No reboot option if you want to avoid downtime. An AMI that can be readily launched as a new EC2 instance for restore purposes is created.
It is also possible to automate this process with the AWS API. Subsequent AMIs/snapshots are only as large as the difference (changed blocks) since the last snapshot, such that storage costs are not necessarily larger for more frequent snapshots, see Amazon’s EBS snapshot billing document for details.
Restoring From Backup
When restoring test backups or performing a restore in production, you may need to make a couple of changes on the newly launched instance if its public or private IP addresses have changed:
Launch a fresh EC2 instance using the newly generated AMI from the previous steps
Stop the app in the Management Console (at port 8800) if it is already running
Ensure that the hostname configured in the Management Console at port 8800 reflects the correct address. If this hostname has changed, you will also need to change it in the corresponding GitHub OAuth application settings or create a new OAuth app to test the recovery and log in to the application.
Update any references to the backed-up instance’s public and private IP addresses in
/etc/default/replicated-operatoron Debian/Ubuntu or
/etc/sysconfig/*in RHEL/CentOS to the new IP addresses.
From the root directory of the Services box, run
sudo rm -rf /opt/nomad. State is saved in the
/opt/nomadfolder that can interfere with builds running when an installation is restored from a backup. The folder and its contents will be regenerated by Nomad when it starts.
Restart the app in the Management Console at port 8800.
Cleaning up Build Records
While filesystem-level data integrity issues are rare and preventable, there will likely be some data anomalies in a point-in-time backup taken while builds are running on the system. For example, a build that is only half-way finished at backup time may result in missing the latter half of its command output, and it may permanently show that it is in Running state in the application.
If you want to clean up any abnormal build records in your database after a recovery, you can delete them by running the following commands on the Services machine replacing the example build URL with an actual URL from your CircleCI application:
circleci dev-console # Wait for console to load user=> (admin/delete-build "https://my-circleci-hostname.com/gh/my-org/my-project/1234")