CircleCI Server v3.x Backup and Restore

Overview

Backup and restore is available for server v3.1.0 and up.

While operating and administering CircleCI server, you will undoubtedly ponder how to maintain backups and recover your installation, should there be a need to migrate it to another cluster or recover from a critical event. This document outlines recommendations for how to back up and restore your CircleCI server instance data and state.

CircleCI server is administered via Kots, which uses Velero for backup and restore. The benefit of this approach is that it not only restores your application’s data, but it also restores the state of the Kubernetes cluster and its resources at the time of the backup. In this way, we can also restore admin-console configurations and customizations you made to your cluster.

Backup and restore of the CircleCI services is dependent on Velero. If your cluster is lost, you will not be able to restore CircleCI until you have successfully brought up Velero in the cluster. From there you can recover the CircleCI services.

The setup

Backups of CircleCI server can be created quite easily through Kots. However, to enable backup support you will need to install and configure Velero on your cluster. The following sections outline the steps needed to install Velero on your cluster.

Prerequisites

  • Download and install the Velero CLI for your environment.

AWS Prerequisites

GCP Prerequisites

  • gcloud and gsutil are installed. You can set them up by installing Google Cloud SDK, which includes both, by referring to the documentation.

For more information, see Velero’s supported providers documentation.

Below, you will find instructions for creating a server 3.x backup on AWS and GCP.

S3 Compatible Storage Prerequisites

  • minio CLI is installed and configured for your storage provider.

Server 3.x backups on AWS

The following steps will assume AWS as your provider and you have met the prerequisites listed above.

These instructions were sourced from the Velero documentation here.

Step 1 - Create an AWS S3 bucket

BUCKET=<YOUR_BUCKET>
REGION=<YOUR_REGION>
aws s3api create-bucket \
    --bucket $BUCKET \
    --region $REGION \
    --create-bucket-configuration LocationConstraint=$REGION
us-east-1 does not support a LocationConstraint. If your region is us-east-1, omit the bucket configuration.

Step 2 - Setup permissions for Velero

  • Create an IAM user

aws iam create-user --user-name velero
  • Attach policies to give user velero the necessary permissions:

cat > velero-policy.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:DescribeSnapshots",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}"
            ]
        }
    ]
}
EOF
aws iam put-user-policy \
  --user-name velero \
  --policy-name velero \
  --policy-document file://velero-policy.json
  • Create an access key for user velero

aws iam create-access-key --user-name velero

The result should look like this:

{
  "AccessKey": {
        "UserName": "velero",
        "Status": "Active",
        "CreateDate": "2017-07-31T22:24:41.576Z",
        "SecretAccessKey": <AWS_SECRET_ACCESS_KEY>,
        "AccessKeyId": <AWS_ACCESS_KEY_ID>
  }
}
  • Create a Velero-specific credentials file (eg: ./credentials-velero) in your local directory, with the following contents:

[default]
aws_access_key_id=<AWS_ACCESS_KEY_ID>
aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>

where the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY placeholders are values returned from the create-access-key request in the previous step.

Step 3 - Install and start Velero

  • Run the following velero install command. This will create a namespace called velero and install all the necessary resources to run Velero. Make sure that you pass the correct file name containing the AWS credentials that you have created in Step 2.

kots backups require restic to operate. When installing Velero, ensure that you have the --use-restic flag set, as shown below:
velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.2.0 \
    --bucket $BUCKET \
    --backup-location-config region=$REGION \
    --snapshot-location-config region=$REGION \
    --secret-file ./credentials-velero \
    --use-restic \
    --wait
  • Once Velero is installed on your cluster, check the new velero namespace. You should have a Velero deployment and a restic daemonset, e.g.:

$ kubectl get pods --namespace velero
NAME                      READY   STATUS    RESTARTS   AGE
restic-5vlww              1/1     Running   0          2m
restic-94ptv              1/1     Running   0          2m
restic-ch6m9              1/1     Running   0          2m
restic-mknws              1/1     Running   0          2m
velero-68788b675c-dm2s7   1/1     Running   0          2m

As restic is a daemonset, there should be one pod for each node in your Kubernetes cluster.

Server 3.x backups on GCP

The following steps are specific for Google Cloud Platform and it is assumed you have met the prerequisites.

These instructions were sourced from the documentation for the Velero GCP plugin here.

Step 1 - Create a GCP bucket

To reduce the chance of typos, we will set some of the parameters as shell variables. Should you be unable to complete all the steps in the same session, do not forget to reset variables as necessary before proceeding. In the step below, for example, we will define a variable for your bucket name. Replace the <YOUR_BUCKET> placeholder with the name of the bucket you want to create for your backups.

BUCKET=<YOUR_BUCKET>

gsutil mb gs://$BUCKET/

Step 2 - Setup permissions for Velero

If your server installation runs within a GKE cluster, ensure that your current IAM user is a cluster admin for this cluster, as RBAC objects need to be created. More information can be found in the GKE documentation.

  1. First, we will set a shell variable for your project ID. To do so, first make sure that your gcloud CLI points to the correct project by looking at the current configuration:

    gcloud config list
  2. If the project is correct, set the variable:

    PROJECT_ID=$(gcloud config get-value project)
  3. Create a service account:

    gcloud iam service-accounts create velero \
        --display-name "Velero service account"
    If you run several clusters with Velero, you might want to consider using a more specific name for the Service Account besides velero, as suggested here.
  4. You can check if the service account has been created successfully by running:

    gcloud iam service-accounts list
  5. Next, store the email address for the Service Account in a variable:

    SERVICE_ACCOUNT_EMAIL=$(gcloud iam service-accounts list \
      --filter="displayName:Velero service account" \
      --format 'value(email)')

    Modify the command as needed to match the display name you have chosen for your Service Account.

  6. Grant the necessary permissions to the Service Account:

    ROLE_PERMISSIONS=(
        compute.disks.get
        compute.disks.create
        compute.disks.createSnapshot
        compute.snapshots.get
        compute.snapshots.create
        compute.snapshots.useReadOnly
        compute.snapshots.delete
        compute.zones.get
    )
    
    gcloud iam roles create velero.server \
        --project $PROJECT_ID \
        --title "Velero Server" \
        --permissions "$(IFS=","; echo "${ROLE_PERMISSIONS[*]}")"
    
    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --member serviceAccount:$SERVICE_ACCOUNT_EMAIL \
        --role projects/$PROJECT_ID/roles/velero.server
    
    gsutil iam ch serviceAccount:$SERVICE_ACCOUNT_EMAIL:objectAdmin gs://${BUCKET}

Now, you need to ensure that Velero can use this Service Account.

Option 1: JSON key file

You can simply pass a JSON credentials file to Velero to authorize it to perform actions as the Service Account. To do this, we first need to create a key:

gcloud iam service-accounts keys create credentials-velero \
    --iam-account $SERVICE_ACCOUNT_EMAIL

After running this, you should have a file named credentials-velero in your local working directory.

Option 2: Workload Identities

If you are already using Workload Identities in your cluster, you can bind the GCP Service Account you just created to Velero’s Kubernetes service account. In this case, the GCP Service Account will need the iam.serviceAccounts.signBlob role in addition to the permissions already specified above.

Step 3 - Install and start Velero

  • Run one of the following velero install commands, depending on how you authorized the service account. This will create a namespace called velero and install all the necessary resources to run Velero.

kots backups require restic to operate. When installing Velero, ensure that you have the --use-restic flag set.

If using a JSON key file

velero install \
    --provider gcp \
    --plugins velero/velero-plugin-for-gcp:v1.2.0 \
    --bucket $BUCKET \
    --secret-file ./credentials-velero \
    --use-restic \
    --wait

If using Workload Identities

velero install \
    --provider gcp \
    --plugins velero/velero-plugin-for-gcp:v1.2.0 \
    --bucket $BUCKET \
    --no-secret \
    --sa-annotations iam.gke.io/gcp-service-account=$SERVICE_ACCOUNT_EMAIL \
    --backup-location-config serviceAccount=$SERVICE_ACCOUNT_EMAIL \
    --use-restic \
    --wait

For more options on customizing your installation, refer to the Velero documentation.

  • Once Velero is installed on your cluster, check the new velero namespace. You should have a Velero deployment and a restic daemonset. eg:

$ kubectl get pods --namespace velero
NAME                      READY   STATUS    RESTARTS   AGE
restic-5vlww              1/1     Running   0          2m
restic-94ptv              1/1     Running   0          2m
restic-ch6m9              1/1     Running   0          2m
restic-mknws              1/1     Running   0          2m
velero-68788b675c-dm2s7   1/1     Running   0          2m

As restic is a daemonset, there should be one pod for each node in your Kubernetes cluster.

Server 3.x backups with S3 Compatible Storage

The following steps will assume you’re using S3 compatible object storage, but not necessarily AWS S3, for your backups. It is also assumed you have met the prerequisites.

These instructions were sourced from the Velero documentation here.

Step 1 - Configure mc client

To start, configure mc to connect to your storage provider:

# Alias can be any name as long as you use the same value in subsequent commands
export ALIAS=my-provider
mc alias set $ALIAS <YOUR_MINIO_ENDPOINT> <YOUR_MINIO_ACCESS_KEY_ID> <YOUR_MINIO_SECRET_ACCESS_KEY>

You can verify your client is correctly configured by running mc ls my-provider and you should see the buckets in your provider enumerated in the output.

Step 2 - Create a bucket

Create a bucket for your backups. It is important that a new bucket is used, as Velero cannot use a preexisting bucket with other content.

mc mb ${ALIAS}/<YOUR_BUCKET>

Set 3 - Create a user and policy

Next, create a user and policy for Velero to access your bucket.

In the following snippet <YOUR_MINIO_ACCESS_KEY_ID> and <YOUR_MINIO_SECRET_ACCESS_KEY> refer to the credentials used by Velero to access MinIO.
# Create user
mc admin user add $ALIAS <YOUR_MINIO_ACCESS_KEY_ID> <YOUR_MINIO_SECRET_ACCESS_KEY>

# Create policy
cat > velero-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:*"
      ],
      "Resource": [
        "arn:aws:s3:::<YOUR_BUCKET>",
        "arn:aws:s3:::<YOUR_BUCKET>/*"
      ]
    }
  ]
}
EOF

mc admin policy add $ALIAS velero-policy velero-policy.json

# Bind user to policy
mc admin policy set $ALIAS velero-policy user=<YOUR_VELERO_ACCESS_KEY_ID>

Finally, we add our new user’s credentials to a file (./credentials-velero in this example) with the following contents:

[default]
aws_access_key_id=<YOUR_VELERO_ACCESS_KEY_ID>
aws_secret_access_key=<YOUR_VELERO_SECRET_ACCESS_KEY>

Step 4 - Install and start Velero

Run the following velero install command. This will create a namespace called velero and install all the necessary resources to run Velero.

kots backups require restic to operate. When installing Velero, ensure that you have the --use-restic flag set, as shown below:
velero install --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.2.0 \
  --bucket <YOUR_BUCKET> \
  --secret-file ./credentials-velero \
  --use-volume-snapshots=false \
  --use-restic \
  --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=<YOUR_ENDPOINT> \
  --wait

Once Velero is installed on your cluster, check the new velero namespace. You should have a Velero deployment and a restic daemonset, e.g.:

$ kubectl get pods --namespace velero
NAME                      READY   STATUS    RESTARTS   AGE
restic-5vlww              1/1     Running   0          2m
restic-94ptv              1/1     Running   0          2m
restic-ch6m9              1/1     Running   0          2m
restic-mknws              1/1     Running   0          2m
velero-68788b675c-dm2s7   1/1     Running   0          2m

As restic is a daemonset, there should be one pod for each node in your Kubernetes cluster.

Creating backups

Now that Velero is installed on your cluster, you should see the snapshots option in the navbar of the management console.

Kots Navbar

If you see this option, you are ready to create your first backup. If you do not see this option, please refer to the troubleshooting section.

Option 1 - Create a backup with kots CLI

To create the backup, run:

kubectl kots backup --namespace <your namespace>

Option 2 - Create a backup with kots admin console

Select Snapshots from the navbar. The default selection should be Full Snapshots, which is recommended.

Kots Navbar

Select the Start a snapshot button.

Kots Create Snapshot

Restoring backups

Option 1 - Restore a backup from a snapshot

To restore from a backup stored in your S3 compatible storage, you will need to ensure Velero is installed on your Kubernetes cluster and that Velero has access to the storage bucket containing the backups. When using EKS, restoring CircleCI server requires that an instance of CircleCI server is installed before-hand. When using GKE or other platforms, a cluster with just velero installed may work.

If this is a new cluster or if you need to re-install Velero, the installation should be done with the same credentials generated above.

Option 2 - Restore a backup using the kots CLI

To restore a backup using the kots CLI, run the following to get a list of backups:

kubectl kots get backups

Using a backup name from the previous command, run the following to start the restore process:

kubectl kots restore --from-backup <backup-instance-id>

Option 3 - Restore a backup using the kots administration console UI

As with backups, navigate to Snapshots in kots admin. Now you should see a list of all your backups, each with a restore icon. Choose the backup you wish to use and select restore.

Kots Create Snapshot
The restore will create new load balancers for CircleCI’s services. You will need to either update your DNS records or the hostname configurations in kots admin-console as a result. You may also need to consider updating the nomad server endpoint provided to your nomad clients.
If you are using pre-existing nomad clients, you will need to restart them before they will connect to the nomad-server cluster.

It should take roughly 10-15 mins for CircleCI server to be restored and operational.

Optional - Scheduling backups with kots

To schedule regular backups, select Snapshots, and then Settings & Schedule from the kots administration console.

Snapshots Selected

And here, you can find configurations related to your snapshots, including scheduling.

Snapshot Settings

Troubleshooting Backups and Restoration

Snapshots are not available in kots admin console

If your kots admin console does not display the snapshot option, you may try the following:

  • Confirm that your version of kots supports snapshots. At this time, we recommend v1.40.0 or above:

$ kubectl kots version
Replicated KOTS 1.40.0
  • Check that Velero is deployed and running correctly. You may check the Velero logs with the command below.

$ kubectl logs deployment/velero --namespace velero

You may need to reinstall Velero as a result.

  • Confirm that snapshots are available on your license. You may reach out to our Customer Support Team to validate this.

Errors occur during backup or restore process

If you experience an error during backup or restore processes, the first place to look would be the Velero logs. Using the command above, you may find 4XX errors, which would likely be caused by issues with your storage bucket access.

  • Confirm that your bucket exists and is in the region you expect.

  • Then confirm that the credentials provided to Velero can be used to access the bucket.

  • You may need to run the command to install Velero again, this time with updated bucket info.

You may also check the status of pods in the velero namespace.

$ kubectl get pods --namespace velero
NAME                      READY   STATUS    RESTARTS   AGE
restic-5vlww              1/1     Pending   0          10m
restic-94ptv              1/1     Running   0          10m
restic-ch6m9              1/1     Pending   0          10m
restic-mknws              1/1     Running   0          10m
velero-68788b675c-dm2s7   1/1     Running   0          10m

In the above example, some restic pods are pending, which means they are waiting for a node to have available CPU or memory resources. You may need to scale your nodes to accommodate restic in this case.



Help make this document better

This guide, as well as the rest of our docs, are open-source and available on GitHub. We welcome your contributions.