TutorialsNov 16, 202316 min read

Deploy and manage AI workloads on Scaleway infrastructure with CircleCI

Developer Advocate

Staff Software Engineer

With automation and CI/CD practices, the entire AI workflow can be run and monitored efficiently, often by a single expert. Still, running AI/ML on GPU instances has its challenges. This tutorial shows you how to meet those challenges using the control and flexibility of CircleCI runners combined with Scaleway, a powerful cloud ecosystem for building, training, and deploying applications at scale. We will also demonstrate the cost effectiveness of ephemeral runners that consume only the resources required for AL/ML work.

Prerequisites

This tutorial builds on an MLOps pipeline example first introduced in our blog series on CI and CD for machine learning. We recommend you read through these first to gain a better understanding of the project we will be working with.

The repository is slightly changed from the previous examples to better work with the specific Scaleway cloud provider and Pulumi.

If you don’t have them yet, you will need to create CircleCI, Scaleway, and Pulumi accounts.

Scaleway offers 100 EUR credit for newly created accounts, which will come in handy when trying it out. This is also valid credit for certain GPU instances which the demo project uses, provided you verify your identity and payment method following their instructions.

For CircleCI we will be using the free tier. Same for Pulumi.

For CircleCI, make sure you are an admin in the organization you are using for the project. Configuring new runner namespaces will require admin access.

Note: This is an advanced tutorial and not aimed at beginners, and assumes a level of familiarity with CircleCI, CI/CD concepts, and infrastructure as code.

High-level project flow

As the pipeline starts, we first provision our environment and create a new runner resource class in CircleCI. This is how CircleCI will communicate with our Scaleway infrastructure.

We will then use Pulumi to provision two GPU instances on Scaleway:

One to host our CircleCI runner and act as our CI/CD agent for training and deploying the model
Another to act as a model server, to which we will deploy our trained models

Note: In real-world production scenarios, you would likely not provision the model serving instance from the same pipeline, as you would need it to be permanent rather than ephemeral.

After everything is provisioned, we will install required dependencies, train, test, and deploy our model, executed on your newly provisioned CircleCI runner. These jobs are thoroughly covered in the CI/CD for ML blog post series, so we will be merely glancing at them here.

Finally, the resources are cleaned up, and the newly created CircleCI runner is removed so the pipeline can run again.

Walkthrough and project setup

We recommend you fork the sample repository and continue from there.

Once you have a project’s fork in your GitHub repository, you can set it up on CircleCI as a new project.

This guide will show you how to get started, and then walk you through various files that comprise the pipeline.

Preparing environment variables

We need a number of secrets and environment variables set up before the pipeline can be run. The secrets are split logically into four contexts.

First, create a new CircleCI API key and store it in a CircleCI context named circleci-api as CIRCLECI_CLI_TOKEN. This is used to provision new runners from within the pipeline.

Next, create a new Pulumi access token and store it in a context named pulumi as PULUMI_ACCESS_TOKEN.

Next, create a Scaleway access key. You will generate two values: access key ID and secret key. Create a scaleway context and store them into SCW_ACCESS_KEY and SCW_SECRET_KEY, respectively.

Finally, create a context ml-scaleway-demo and populate it with three environment variables:

DEPLOY_SERVER_USERNAME (we use demo)
DEPLOY_SERVER_PASSWORD (we use demodemo)
DEPLOY_SERVER_PATH as /var/models

Pulumi project and stack setup

Pulumi is the tool that will help us provision infrastructure. It offers an SDK-based approach to infrastructure provisioning for many different programming languages. This allows us to use Python, like in the rest of the AI/ML scripts.

In Pulumi you will need to create a new project and stack. Stack corresponds to individual applications in a Pulumi project. Our project is located in the org zmarkan-demos, and is named cci-ml-runner. It contains one stack, cci-runner-linux.

Files for Pulumi are located in pulumi, and you might want to modify them with your preferred project and stack names, as well as your Scaleway configuration.

Scaleway project setup

In Scaleway we also created a new project named cci-ml-runner. Go to Project Settings, copy the Project ID (it should be in the UUID format) and paste it in the file pulumi/Pulumi.cci-runner-linux.yaml where you see scaleway:project_id: Leave the rest of the file unchanged — we need this specific region and zone combination to use the GPU resources.

config:
 scaleway:project_id: YOUR_PROJECT_ID_UUID
 scaleway:region: fr-par
 scaleway:zone: fr-par-2

Prerequisites

High-level project flow

Walkthrough and project setup

Preparing environment variables

Pulumi project and stack setup

Scaleway project setup

Setting up the CircleCI pipeline

Provision Runner

Pulumi provisioning scripts and Scaleway GPU resource configuration

Using the new runner resource class to run AI/ML workloads in a CircleCI pipeline

Destroy runner and clean up the environment

Conclusion

Similar posts you may enjoy

CI for machine learning: Build, test, train

CD for machine learning: Deploy, monitor, retrain

Goodbye, GitOps: Getting to green in an AI-powered world