Schedule database backups for MongoDB in a Node.js application
Fullstack Developer and Tech Author
Database backup protects your data by creating a copy of your database locally, or remotely on a backup server. This operation is often performed manually by database administrators. Like every other human-dependent activity, it is susceptible to errors and requires lots of time.
Regularly scheduled backups go a long way to safeguarding your customers’ details in the case of operating system failure or security breach. In this tutorial, I will guide you through the process of scheduling a backup version of your application’s database at a defined regular interval using scheduled pipelines.
To get a good grasp of the process of automated database backup operation, we will set up a database backup for a Node.js application with a MongoDB database. This application will be deployed on Heroku using the deployment pipeline on CircleCI. The MongoDB database will be hosted on MongoDB Atlas; a multi-cloud platform for database hosting and deployment.
For easy access, the generated, backed-up MongoDB collection for our application will be stored on Microsoft Azure Storage.
Prerequisites
Here is what you need to follow this tutorial successfully:
- Node.js installed on your computer
- A CircleCI account
- A GitHub account
- A Heroku account
- A free MongoDB Atlas account or its equivalent
- An API testing tool, such as Postman
- An Azure account
Cloning the demo application
To get started, run this command to clone the demo application:
git clone https://github.com/yemiwebby/db-cleanup-starter.git db-back-up-schedule
Next, move into the newly cloned app and install all its dependencies:
cd db-back-up-schedule
npm install
This application contains these endpoints:
new-company
creates a new company by specifying the name of the company and its founder.companies
retrieves the list of companies from the database.
When the installation process has finished, create a .env
file and populate it with this:
MONGODB_URI=YOUR_MONGODB_URL
If you would rather, you can simply run this command to copy the content from .env.sample
file within the starter project:
cp .env.sample .env
Of course, you need to replace the YOUR_MONGODB_URL
placeholder with the connection string you used for your remote MongoDB URI. This tutorial uses MongoDB Atlas database and you can easily set one up. I will explain how to do that next.
Creating a MongoDB Atlas account and database
Create a free Atlas account here and follow the instructions to deploy a free tier cluster. Once you have a cluster and database user set up, open and edit the .env
file.
Replace the YOUR_MONGODB_URL
placeholder with the extracted connection string from your MongoDB Atlas dashboard:
MONGODB_URI=mongodb+srv://<username>:<password>@<clustername>.mongodb.net/<dbname>?retryWrites=true&w=majority
Replace the <username>
, <password>
, <clustername>
and <dbname>
with the values for your cluster.
Running the demo application
When the database is properly created and configured, open a terminal and run the demo application with:
npm run start
You will get this output:
> db-back-up-schedule@1.0.0 start
> node server.js
Server is running at port 3000
Connected successfully
Creating a company
Test the demo application by creating new company details. Open up Postman or your preferred API testing tool. Send a POST
request to the http://localhost:3000/new-company
endpoint using this JSON data:
{
"name": "Facebook",
"founder": "Mark"
}
Viewing the list of companies
Next, send a GET
request to http://localhost:3000/companies
to retrieve that list of companies.
Creating an application on Heroku
Next, create a new application on Heroku to host and run the Node.js project. Go to the Heroku dashboard to begin. Click New and then New App. Fill in the form with a name for your application and your region.
Note: Application names on Heroku are unique. Pick one that is available and make a note of it.
Click the Create app button. You will be redirected to the Deploy view of your newly created application.
Next, create a configuration variable to reference the MongoDB URI that was extracted from the MongoDB Atlas dashboard earlier. To do that, navigate to the Settings page, scroll down, and click the Reveal Config Vars button.
Specify the key and value as shown here, and click Add once you are done.
Lastly, you need to retrieve the API key for your Heroku account. This key will be used to connect your CircleCI pipeline to Heroku. To get your API key, open the Account Settings page.
Scroll to the API keys section.
Click the Reveal button and copy the API key. Save it somewhere you can easily find it later.
Creating an Azure storage account
As mentioned, Microsoft Azure storage will be used to host the backed-up MongoDB collection for our database. To do this, you need to sign up for a free account on Azure. Then go to your Azure portal dashboard.
Click Storage accounts from the list of services or use the search feature by typing “storage” in the search bar.
From the storage account page, click Create. On this new page, specify the details for your storage account.
Next, do this:
- Select a subscription.
- Select an existing resource group or create a new one.
- Enter a storage account name. For this tutorial, I named mine
dbblobs
. - Select a region closer to you.
Click Review + Create, then click the Create button. Your storage account will be created and deployed.
It is worth mentioning that an Azure blob storage account offers these resources:
- The storage account that was just been created.
Container
, which helps organize a set of blobs, similar to a directory in a file system.blob
, which is usually stored in a container similar to files stored in a directory.
At this point, you have a functioning storage account. The next thing to do is create a container to house your blobs (MongoDB collection backup, in our case). On your new storage account page, click Containers from the side menu bar. Then click + Container to create a new container.
Give your container a name and change the public access level. Click the Create button once you are done.
Retrieving the access key
To easily establish a remote connection to either store or retrieve files from your storage account, you need an access key. By default, each storage account on Azure comes with two different access keys, which allows you to replace one while using the other. To reveal your keys, click Show keys and copy one of the keys, preferably the first one.
Paste the key somewhere safe on your computer; you will need it later.
Adding the pipeline configuration script
Next, you wneed to add the pipeline configuration for CircleCI. The pipeline will consist of steps to install the project’s dependencies and compile the application for production.
At the root of your project, create a folder named .circleci
. In that folder, create a file named config.yml
. In the newly created file, add this configuration:
version: 2.1
orbs:
heroku: circleci/heroku@1.2.6
jobs:
build:
executor: heroku/default
steps:
- checkout
- heroku/install
- heroku/deploy-via-git:
force: true
workflows:
deploy:
jobs:
- build
This configuration pulls in the Heroku orb circleci/heroku
, which automatically provides access to a robust set of Heroku jobs and commands. One of those jobs is heroku/deploy-via-git
, which deploys your application straight from your GitHub repo to your Heroku account.
Next, set up a repository on GitHub and link the project to CircleCI. Review Pushing a project to GitHub for step-by-step instructions.
Log in to your CircleCI account. If you signed up with your GitHub account, all your repositories will be available on your project’s dashboard.
Click Set Up Project for your db-clean-up
project.
You will be prompted with a couple of options for the configuration file. Select the use the .circleci/config.yml in my repo
option. Enter the name of the branch where your code is housed on GitHub, then click the Set Up Project button.
Your first workflow will start running, but it will fail. This is because you have not provided your Heroku API key. You can fix that now.
Click the Project Settings button, then click Environment Variables. Add these two new variables:
HEROKU_APP_NAME
is the app name in Heroku (db-clean-up
)HEROKU_API_KEY
is the Heroku API key that you retrieved from the account settings page
Select Rerun Workflow from Failed to rerun the Heroku deployment. This time, your workflow will run successfully.
To confirm that your workflow was successful, you can open your newly deployed app in your browser. The URL for your application should be in this format https://<HEROKU_APP_NAME>.herokuapp.com/
.
Here is a quick recap of what you have done and learned so far. You have:
- Created working application locally
- Created a functioning application on Heroku
- Created a Microsoft Azure blob storage account
- Successfully set up a pipeline to automate the deployment of your application to Heroku using CircleCI
Generating and uploading the backup file
MongoDB stores data records as documents; specifically BSON documents, which are gathered together in collections.
In this section, you will create a script to generate the database backup file (BSON document) for your project and also upload the file to Microsoft Azure. To do this, we will use two different tools:
[mongodump](https://docs.mongodb.com/database-tools/mongodump/)
works by running a simple command. It is a utility tool that can be used for creating a binary export of the contents of a database. MongoDump tool is part of the MongoDB Database Tools package and will be installed once you deploy your application on CircleCI.[Azure Storage Blob](https://www.npmjs.com/package/@azure/storage-blob)
is a JavaScript library that makes it easy to consume the Microsoft Azure Storage blob service from a Node.js application. This library has already been included and installed as a dependency for our project in this tutorial.
To generate and upload the backup file, create a new file named backup.js
at the root of the application and use this content for it:
require("dotenv").config();
const exec = require("child_process").exec;
const path = require("path");
const {
BlobServiceClient,
StorageSharedKeyCredential,
} = require("@azure/storage-blob");
const backupDirPath = path.join(__dirname, "database-backup");
const storeFileOnAzure = async (file) => {
const account = process.env.ACCOUNT_NAME;
const accountKey = process.env.ACCOUNT_KEY;
const containerName = "dbsnapshots";
const sharedKeyCredential = new StorageSharedKeyCredential(
account,
accountKey
);
// instantiate Client
const blobServiceClient = new BlobServiceClient(
`https://${account}.blob.core.windows.net`,
sharedKeyCredential
);
const container = blobServiceClient.getContainerClient(containerName);
const blobName = "companies.bson";
const blockBlobClient = container.getBlockBlobClient(blobName);
const uploadBlobResponse = await blockBlobClient.uploadFile(file);
console.log(
`Upload block blob ${blobName} successfully`,
uploadBlobResponse.requestId
);
};
let cmd = `mongodump --forceTableScan --out=${backupDirPath} --uri=${process.env.MONGODB_URI}`;
const dbAutoBackUp = () => {
let filePath = backupDirPath + `/db-back-up-schedule/companies.bson`;
exec(cmd, (error, stdout, stderr) => {
console.log([cmd, error, backupDirPath]);
storeFileOnAzure(filePath);
});
};
dbAutoBackUp();
The content in this file imported the required dependencies, including the Azure storage client SDK, and specified the path where the backup file will be housed. Next, it created:
cmd
is amongodump
command that will be executed and used to generate the backup file. The--out
flag specifies the path to the folder where the file will be housed while--uri
specifies the MongoDB connection string.- The
*storeFileOnAzure()*
function takes the exact absolute path of the backup file and uploads it to the created Azure storage container using the Azure Storage Blob client library. - The
*dbAutoBackUp()*
function uses the ` exec``() ` function from JavaScript to create a new shell and executes the specified MongoDump command. Also, thefilepath
references the exact location of the generated bson file (companies.bson in this case).
Note: companiesdb
and companies.bson
represent of the database name and table name for the application as seen on MongoDB Atlas. So, if your database name is userdb
and table name is users
, then your file path would point to userdb/user.bson
file.
Creating and implementing a scheduled pipeline
There are two different options for setting up scheduled pipelines from scratch:
- Using the API
- Using project settings
In this tutorial, we will use the API, so you will need:
- CircleCI API token
- Name of the version control system where your repository
- Your organization name
- Current project ID on CircleCI
To get the token, go to your CircleCI dashboard and click your avatar:
You will be redirected to your User Settings page. From there, navigate to Personal API Tokens, create a new token, give your token a name and save it somewhere safe.
Now, open the .env
file from the root of your project and add:
VCS_TYPE=VERSION_CONTROL_SYSTEM
ORG_NAME=ORGANISATION_NAME
PROJECT_ID=PROJECT_ID
CIRCLECI_TOKEN=YOUR_CIRCLECI_TOKEN
MONGODB_URI=YOUR_MONGODB_URL
Replace the placeholders with your values:
VCS_TYPE
is your version control system, such asgithub
.ORG_NAME
is your GitHub username or organization name.PROJECT_ID
is your project ID on CircleCI. It isdb-clean-up
for the sample project.CIRCLECI_TOKEN
: is your CircleCI token.MONGODB_URI
is your MongoDB URI string as extracted from MongoDB Atlas dashboard.
The next thing to do is create a new file named schedule.js
within the root of your project and use this content for it:
const axios = require("axios").default;
require("dotenv").config();
const API_BASE_URL = "https://circleci.com/api/v2/project";
const vcs = process.env.VCS_TYPE;
const org = process.env.ORG_NAME;
const project = process.env.PROJECT_ID;
const token = process.env.CIRCLECI_TOKEN;
const postScheduleEndpoint = `${API_BASE_URL}/${vcs}/${org}/${project}/schedule`;
async function scheduleDatabaseBackup() {
try {
let res = await axios.post(
postScheduleEndpoint,
{
name: "Database backup",
description: "Schedule database backup for your app in production",
"attribution-actor": "current",
parameters: {
branch: "main",
"run-schedule": true,
},
timetable: {
"per-hour": 30,
"hours-of-day": [
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23,
],
"days-of-week": ["MON", "TUE", "WED", "THU", "FRI", "SAT", "SUN"],
},
},
{
headers: { "circle-token": token },
}
);
console.log(res.data);
} catch (error) {
console.log(error.response);
}
}
scheduleDatabaseBackup();
This code creates a function named **scheduleDatabaseBackup()**
to post pipeline schedule details to CircleCI API.
The payload includes:
name
, which is the schedule name. It needs to be unique.description
is an optional field and is used to describe the schedule.attribution-actor
can be eithersystem
for a neutral actor orcurrent
, which takes your current user’s permissions (as per the token you use).- The
parameters
object specifies which branch to trigger. It includes an additional value for checking when to run the pipeline. timetable
defines when and how frequently to run the scheduled pipelines. The fields to use here areper-hour
,hours-of-day
, anddays-of-week
.
Note that timetable
does not take a cron expression, making it more easily parsable by humans reasoning with the API. For this tutorial, the schedule is set to run 30 times within an hour, which is about every 2 minutes.
The code also passes the CircleCI token to the header.
Updating configuration file
Before running the scheduled pipeline, we need to update the CircleCI pipeline configuration script. Open .circleci/config.yml
file and replace its content with this:
version: 2.1
orbs:
heroku: circleci/heroku@1.2.6
jobs:
build:
executor: heroku/default
steps:
- checkout
- heroku/install
- heroku/deploy-via-git:
force: true
schedule_backup:
working_directory: ~/project
docker:
- image: cimg/node:17.4.0
steps:
- checkout
- run:
name: Install MongoDB Tools.
command: |
npm install
sudo apt-get update
sudo apt-get install -y mongodb
- run:
name: Run database back up
command: npm run backup
parameters:
run-schedule:
type: boolean
default: false
workflows:
deploy:
when:
not: << pipeline.parameters.run-schedule >>
jobs:
- build
backup:
when: << pipeline.parameters.run-schedule >>
jobs:
- schedule_backup
The config now includes a new job named schedule_backup
. It uses the Docker image to install Node.js and MongoDB tools. The config includes parameters and uses the run-schedule
pipeline variable to check when to run the workflows.
For all workflows, add when
expressions that indicate to run them when run-schedule
is true
and not to run other workflows unless run-schedule
is false
.
Creating more environment variables on CircleCI
Just before you add and push all updates to GitHub, add the MongoDB connection string, Azure account name, and key as environment variables on your CircleCI project.
From the current project pipelines page, click the Project Settings button. Next, select Environment Variables from the side menu. Add these variables:
ACCOUNT_KEY
is your Microsoft Azure storage account key.ACCOUNT_NAME
is the Microsoft Azure storage account name (dbblobs
for this tutorial).MONGODB_URI
is your MongoDB connection string.
Now, update git and push your code back to GitHub.
Running the scheduled pipeline
The schedule configuration file is updated and ready to go. To create the scheduled pipeline, run this from the root of your project:
node schedule.js
The output should be similar to this:
{
"description": "Schedule database backup for your app in production",
"updated-at": "2022-03-07T07:07:25.408Z",
"name": "Database backup",
"id": "caa627c8-2768-4ac7-8150-e808fb566cc6",
"project-slug": "gh/CIRCLECI-GWP/db-back-up-schedule",
"created-at": "2022-03-07T07:07:25.408Z",
"parameters": { "branch": "main", "run-schedule": true },
"actor": {
"login": "daumie",
"name": "Dominic Motuka",
"id": "335b50ce-fd34-4a74-bc0b-b6455aa90325"
},
"timetable": {
"per-hour": 30,
"hours-of-day": [
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23
],
"days-of-week": ["MON", "TUE", "WED", "THU", "FRI", "SAT", "SUN"]
}
}
Review your scheduled pipelines in action
Return to the pipeline page on CircleCI. Your pipeline will be triggered every two minutes.
This is a good time to open the container within your Azure storage account to confirm that the file has been uploaded successfully.
Bonus section: retrieving a schedule list and deleting a schedule
In this last section, you will learn:
- How to retrieve all the schedules for a particular project
- How to delete any schedule
Retrieve the list of schedules for a project
To fetch all schedules, create a new file named get.js
within the root of the project. Enter this content:
const axios = require("axios").default;
require("dotenv").config();
const API_BASE_URL = "https://circleci.com/api/v2/project";
const vcs = process.env.VCS_TYPE;
const org = process.env.ORG_NAME;
const project = process.env.PROJECT_ID;
const token = process.env.CIRCLECI_TOKEN;
const getSchedulesEndpoint = `${API_BASE_URL}/${vcs}/${org}/${project}/schedule/`;
async function getSchedules() {
let res = await axios.get(getSchedulesEndpoint, {
headers: {
"circle-token": `${token}`,
},
});
console.log(res.data.items[0]);
}
getSchedules();
This snippet fetches and logs the schedules in your terminal, but just the first item within the schedules array. To see all items, replace res.data.items[0]
with res.data.items
.
Now run the file with node get.js
. Your output should be similar to this:
{
description: 'Schedule database backup for your app in production',
'updated-at': '2022-03-07T10:49:58.123Z',
name: 'Database backup',
id: '6aa72c63-b4c4-4dc0-b099-b8661a7a2052',
'project-slug': 'gh/yemiwebby/db-back-up-schedule',
'created-at': '2022-03-07T10:49:58.123Z',
parameters: { branch: 'main', 'run-schedule': true },
actor: {
login: 'yemiwebby',
name: 'Oluyemi',
id: '7b490556-c1bb-4b42-a201-c1785a00005b'
},
timetable: {
'per-hour': 30,
'hours-of-day': [
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23
],
'days-of-week': [
'MON', 'TUE',
'WED', 'THU',
'FRI', 'SAT',
'SUN'
]
}
}
Delete any schedule
Deleting a schedule requires its unique ID
. We can use the ID
of the schedule from the previous section for this demonstration.
Create another file named delete.js
and paste this code in it:
const axios = require("axios").default;
require("dotenv").config();
const API_BASE_URL = "https://circleci.com/api/v2/schedule";
const vcs = process.env.VCS_TYPE;
const org = process.env.ORG_NAME;
const project = process.env.PROJECT_ID;
const token = process.env.CIRCLECI_TOKEN;
const schedule_ids = ["YOUR_SCHEDULE_ID"];
async function deleteScheduleById() {
for (let i = 0; i < schedule_ids.length; i++) {
let deleteScheduleEndpoint = `${API_BASE_URL}/${schedule_ids[i]}`;
let res = await axios.delete(deleteScheduleEndpoint, {
headers: { "circle-token": token },
});
console.log(res.data);
}
}
deleteScheduleById();
Replace the YOUR_SCHEDULE_ID
placeholder with the ID
extracted from the previous section and save the file. Next, run node delete.js
from the terminal. The output:
{ message: 'Schedule deleted.' }
Conclusion
In this tutorial, you downloaded a sample project from GitHub and ran it locally on your machine before deploying it to the Heroku platform via CircleCI. You then created some records in your MongoDB database and created a script to generate a backup collection of the database using MongoDB tools. You stored the backup file on Microsoft Azure and used the scheduled pipeline feature from CircleCI to automate the file backup process at a reasonable interval.
This tutorial covers an important use case for scheduled pipelines because it automates a task that would otherwise have been done manually. Tasks like scheduling database clean-ups are too important to be left to humans. They take up valuable developer time and in busy or stressful times it is easy to forget them. Scheduling pipelines for database clean-up solves these problems so you and your team have more time to develop and release applications.
I hope that you found this tutorial helpful. The complete source code can be found here on GitHub.