Over time, large applications generate data that becomes “stale”. Year-old request logs or the coordinates of a dispatch rider for an already delivered order are just not relevant any longer. Getting rid of redundant or obsolete data keeps the database clean and uncluttered. These types of data should be archived, or removed from the database on a regular schedule.

In this tutorial, I will demonstrate one strategy for cleaning up databases. We will be clearing log data from a Node.js application using an automated scheduled job in a CI pipeline.

Prerequisites

To complete this tutorial, you will need:

  1. Basic knowledge of JavaScript
  2. Node.js installed on your system
  3. Postman for desktop installed on your system
  4. Heroku or another application hosting platform
  5. A CircleCI account
  6. A GitHub account

When these items are installed and set up, we can begin the tutorial.

Cloning the logs demo project

To begin, you will need to clone the Node.js project. We will be working with a simple Node.js API application with a root endpoint and two other endpoints for fetching and cleaning up log data. Clone the project by running this command:

git clone --single-branch --branch base-project https://github.com/coderonfleek/db-cleanups.git

When the cloning process is complete, go to the root of the project and install the dependencies:

cd db-cleanups
npm install

Next, run the application:

npm start

The application will start listening on a default port of 3000. Open up Postman and make a GET request to the http://localhost:3000/logs/fetch/50 endpoint. This will return an array of 50 log data, each containing an event and the date that it occurred.

Get Logs - Postman

Notice that some of the logs are over a year old.

Testing the logs cleanup locally

Remember, this project also consists of an endpoint that cleans up outdated data. Here is the code for the clean-up endpoint in server.js:

app.get("/logs/clean", async (req, res) => {
  try {
    let d = new Date();
    let currentYear = d.getFullYear();
    let today = new Date(`${currentYear}-01-01`);

    let total_records_to_delete = await req.db
      .collection("logs")
      .find({
        date: {
          $lt: today
        }
      })
      .count();

    const removedResult = await req.db.collection("logs").deleteMany({
      date: { $lt: new Date(today) }
    });

    res.status(200).send({
      message: `There are ${total_records_to_delete} to delete`,
      result: removedResult
    });
  } catch (error) {
    console.log(error);
    return res.status(500).send(error);
  }
});

The handler script for the clean-up endpoint gets the current year. It then deletes every log record that is older than the current year.

Run this endpoint (http://localhost:3000/logs/clean) in Postman.

Clean Logs - Postman

The success message for the clean-up operation shows that 21 obsolete rows (this number may vary) were found for deletion based on the set criteria. The return value of the delete operation is captured in the result property of the response data.

To be most effective, a clean-up endpoint like this should be called regularly, on a schedule that makes sense for your team. The next part of this tutorial is automating the clean-up using scheduled jobs.

Creating the project’s Heroku app for hosting

We need to deploy the application before we can schedule a job to call its clean-up endpoint. For this tutorial, we will create a Heroku app.

Go to your Heroku dashboard and create a new hosting application. Click New, then Create new app.

New App - Postman

After you have the app set up, make a note of the app name (in this case db-cleanups). Get your Heroku API key from the Account Settings page. We will be using the API key later on to set up environment variables on CircleCI.

Connecting the project to CircleCI

Begin by pushing your project to GitHub.

Note: The cloned project may throw an error about having already been initialized as a git repo (or one that a remote repo is already contained). If this happens, run rm -rf .git to delete any existing git artifacts. Then re-initialize it with git init

After you have committed all changes, go to the Add Projects page on the CircleCI dashboard to add the project.

Add Project - CircleCI

Click Set Up Project.

Add Config - CircleCI

On the setup page, click Use Existing Config to indicate that you are setting up a configuration file manually and not using the sample displayed. Next, you get a prompt to either download a configuration file for the pipeline, or to start building.

Build Prompt - CircleCI

Click Start Building. This build will fail because we have not set up our configuration file yet. We’ll do this in the next step.

Before you leave the CircleCI console, you need to set up environment variables to deploy the application to both environments (staging and production) on Heroku.

From the Pipelines page, select your application, then click Project Settings.

Project Settings - CircleCI

From the Project Settings side-menu, click Environment Variables, then clickAdd Environment Variable.

Add Environment variable - CircleCI

Add these variables:

  • HEROKU_APP_NAME: The Heroku app name project (in this case db-cleanups)
  • HEROKU_API_KEY: Your Heroku API Key

Now you are ready to deploy the project to the Heroku hosting platform.

Writing the deploy and scheduled clean-up pipeline script

We have arrived at the main goal of this tutorial, which is writing a script to periodically call the clean-up endpoint to get rid of obsolete log data.

Start by creating a folder named .circleci at the root of the project. Add a configuration file named config.yml inside the folder you just created. Enter the following code:

version: 2.1
orbs:
  heroku: circleci/heroku@0.0.10
jobs:
  build:
    executor: heroku/default
    steps:
      - checkout
      - heroku/install
      - heroku/deploy-via-git

  clean:
    docker:
      - image: circleci/node:10.16.3
    steps:
      - run:
          name: Clean Database
          command: "curl https://[YOUR_HEROKU_APP_NAME].herokuapp.com/logs/clean"

workflows:
  deploy:
    jobs:
      - build
  clean_old_logs:
    triggers:
      - schedule:
          cron: "* * * * *" # use cron syntax to set the schedule
          filters:
            branches:
              only:
                - main
    jobs:
      - clean

In this script, there are two jobs that do the following:

  • build: Uses the CircleCI Heroku orb to deploy the logs project to Heroku
  • clean: Runs a curl command that calls the cleanup endpoint for the deployed application.

Note: Make sure you replace YOUR_HEROKU_APP_NAME with the name you entered for your Heroku app.

After creating the jobs for deploying and running the clean-up endpoint, the script defines two workflows:

  • deploy: Runs the build job to deploy the application to Heroku
  • clean_old_logs: Defines a trigger that uses cron syntax to set a schedule and runs the clean job every minute

In this tutorial, we are triggering the clean job every minute to demonstrate how the job runs periodically. In practice, you would run this type of job once a year. Use the appropriate cron syntax to implement the correct schedule. To find out what works best, you can tinker with this cron schedule expression.

Commit all your changes and push the code to the remote repository to run the script and trigger the deployment pipeline.

The Pipelines page shows the deploy workflow run. It also shows that, each minute, the scheduled clean_old_logs workflow runs.

Scheduled Jobs - CircleCI

Click the clean job from the first clean_old_logs operation to review the response of the call to the clean-up endpoint.

![Clean up response - CircleCI]22œ(//images.ctfassets.net/il1yandlcjgk/3yI1ZE4abL8QP3VVQtbM3U/2999112a0ed2f847679adb0c9e130251/2020-10-13-cleanup-job-details.png){: .zoomable }

The response removed 18 records from the database (the actual number might be different). Click the next run of the clean job. The response shows that there are 0 outdated records.

Clean up response - CircleCI

The first clean operation removed all the records older than this year.

Conclusion

In this tutorial, you have used scripts to schedule the clean up of obsolete data from your database. Scheduled database clean up is one more reason to love CI/CD pipelines and process automation.

Happy coding!


Fikayo Adepoju is a LinkedIn Learning (Lynda.com) Author, Full-stack developer, technical writer, and tech content creator proficient in Web and Mobile technologies and DevOps with over 10 years experience developing scalable distributed applications. With over 40 articles written for CircleCI, Twilio, Auth0, and The New Stack blogs, and also on his personal Medium page, he loves to share his knowledge to as many developers as would benefit from it. You can also check out his video courses on Udemy.

Read more posts by Fikayo Adepoju