Triggering jobs for database clean up
Fullstack Developer and Tech Author
Over time, large applications generate data that becomes “stale”. Year-old request logs or the coordinates of a dispatch rider for an already delivered order are just not relevant any longer. Getting rid of redundant or obsolete data keeps the database clean and uncluttered. These types of data should be archived, or removed from the database on a regular schedule.
In this tutorial, I will demonstrate one strategy for cleaning up databases. We will be clearing log data from a Node.js application using an automated scheduled job in a CI pipeline.
Prerequisites
To complete this tutorial, you will need:
- Basic knowledge of JavaScript
- Node.js installed on your system
- Postman for desktop installed on your system
- Heroku or another application hosting platform
- A CircleCI account
- A GitHub account
When these items are installed and set up, we can begin the tutorial.
Cloning the logs demo project
To begin, you will need to clone the Node.js project. We will be working with a simple Node.js API application with a root endpoint and two other endpoints for fetching and cleaning up log data. Clone the project by running this command:
git clone --single-branch --branch base-project https://github.com/coderonfleek/db-cleanups.git
When the cloning process is complete, go to the root of the project and install the dependencies:
cd db-cleanups
npm install
Next, run the application:
npm start
The application will start listening on a default port of 3000
.
Open up Postman and make a GET
request to the http://localhost:3000/logs/fetch/50
endpoint. This will return an array of 50 log data, each containing an event and the date that it occurred.
Notice that some of the logs are over a year old.
Testing the logs
cleanup locally
Remember, this project also consists of an endpoint that cleans up outdated data. Here is the code for the clean-up endpoint in server.js
:
app.get("/logs/clean", async (req, res) => {
try {
let d = new Date();
let currentYear = d.getFullYear();
let today = new Date(`${currentYear}-01-01`);
let total_records_to_delete = await req.db
.collection("logs")
.find({
date: {
$lt: today
}
})
.count();
const removedResult = await req.db.collection("logs").deleteMany({
date: { $lt: new Date(today) }
});
res.status(200).send({
message: `There are ${total_records_to_delete} to delete`,
result: removedResult
});
} catch (error) {
console.log(error);
return res.status(500).send(error);
}
});
The handler script for the clean-up endpoint gets the current year. It then deletes every log record that is older than the current year.
Run this endpoint (http://localhost:3000/logs/clean
) in Postman.
The success message for the clean-up operation shows that 21 obsolete rows (this number may vary) were found for deletion based on the set criteria. The return value of the delete operation is captured in the result
property of the response data.
To be most effective, a clean-up endpoint like this should be called regularly, on a schedule that makes sense for your team. The next part of this tutorial is automating the clean-up using scheduled jobs.
Creating the project’s Heroku app for hosting
We need to deploy the application before we can schedule a job to call its clean-up endpoint. For this tutorial, we will create a Heroku app.
Go to your Heroku dashboard and create a new hosting application. Click New, then Create new app.
After you have the app set up, make a note of the app name (in this case db-cleanups
). Get your Heroku API key from the Account Settings page. We will be using the API key later on to set up environment variables on CircleCI.
Connecting the project to CircleCI
Begin by pushing your project to GitHub.
Note: The cloned project may throw an error about having already been initialized as a git repo (or one that a remote repo is already contained). If this happens, run rm -rf .git
to delete any existing git
artifacts. Then re-initialize it with git init
After you have committed all changes, go to the Add Projects page on the CircleCI dashboard to add the project.
Click Set Up Project.
On the setup page, click Use Existing Config to indicate that you are setting up a configuration file manually and not using the sample displayed. Next, you get a prompt to either download a configuration file for the pipeline, or to start building.
Click Start Building. This build will fail because we have not set up our configuration file yet. We’ll do this in the next step.
Before you leave the CircleCI console, you need to set up environment variables to deploy the application to both environments (staging and production) on Heroku.
From the Pipelines page, select your application, then click Project Settings.
From the Project Settings side-menu, click Environment Variables, then clickAdd Environment Variable.
Add these variables:
HEROKU_APP_NAME
: The Heroku app name project (in this casedb-cleanups
)HEROKU_API_KEY
: Your Heroku API Key
Now you are ready to deploy the project to the Heroku hosting platform.
Writing the deploy and scheduled clean-up pipeline script
We have arrived at the main goal of this tutorial, which is writing a script to periodically call the clean-up endpoint to get rid of obsolete log data.
Start by creating a folder named .circleci
at the root of the project. Add a configuration file named config.yml
inside the folder you just created. Enter the following code:
version: 2.1
orbs:
heroku: circleci/heroku@0.0.10
jobs:
build:
executor: heroku/default
steps:
- checkout
- heroku/install
- heroku/deploy-via-git
clean:
docker:
- image: circleci/node:10.16.3
steps:
- run:
name: Clean Database
command: "curl https://[YOUR_HEROKU_APP_NAME].herokuapp.com/logs/clean"
workflows:
deploy:
jobs:
- build
clean_old_logs:
triggers:
- schedule:
cron: "* * * * *" # use cron syntax to set the schedule
filters:
branches:
only:
- main
jobs:
- clean
In this script, there are two jobs that do the following:
build
: Uses the CircleCI Heroku orb to deploy the logs project to Herokuclean
: Runs acurl
command that calls the cleanup endpoint for the deployed application.
Note: Make sure you replace YOUR_HEROKU_APP_NAME
with the name you entered for your Heroku app.
After creating the jobs for deploying and running the clean-up endpoint, the script defines two workflows:
deploy
: Runs thebuild
job to deploy the application to Herokuclean_old_logs
: Defines a trigger that usescron
syntax to set a schedule and runs theclean
job every minute
In this tutorial, we are triggering the clean
job every minute to demonstrate how the job runs periodically. In practice, you would run this type of job once a year. Use the appropriate cron
syntax to implement the correct schedule. To find out what works best, you can tinker with this cron schedule expression.
Commit all your changes and push the code to the remote repository to run the script and trigger the deployment pipeline.
The Pipelines page shows the deploy
workflow run. It also shows that, each minute, the scheduled clean_old_logs
workflow runs.
Click the clean
job from the first clean_old_logs
operation to review the response of the call to the clean-up endpoint.
![Clean up response - CircleCI]22œ(//images.ctfassets.net/il1yandlcjgk/3yI1ZE4abL8QP3VVQtbM3U/2999112a0ed2f847679adb0c9e130251/2020-10-13-cleanup-job-details.png){: .zoomable }
The response removed 18 records from the database (the actual number might be different). Click the next run of the clean
job. The response shows that there are 0 outdated records.
The first clean
operation removed all the records older than this year.
Conclusion
In this tutorial, you have used scripts to schedule the clean up of obsolete data from your database. Scheduled database clean up is one more reason to love CI/CD pipelines and process automation.
Happy coding!