Deploy and manage AI workloads on Scaleway infrastructure with CircleCI

Developer Advocate

Staff Software Engineer

With automation and CI/CD practices, the entire AI workflow can be run and monitored efficiently, often by just one person. Still, running AI/ML on GPU instances has its challenges. This tutorial shows you how to meet those challenges using the control and flexibility of CircleCI runners combined with Scaleway, a powerful cloud ecosystem for building, training, and deploying applications at scale. We will also demonstrate the cost effectiveness of ephemeral runners that consume only the resources required for AL/ML work.
Prerequisites
This tutorial builds on an MLOps pipeline example first introduced in our blog series on CI and CD for machine learning. We recommend you read through these first to gain a better understanding of the project you will be working with.
The repository is slightly changed from the previous examples to better work with the specific Scaleway cloud provider and Pulumi.
If you don’t have them yet, you will need to create CircleCI, Scaleway, and Pulumi accounts.
Scaleway offers 100 EUR credit for newly created accounts, which will come in handy when trying it out. This is also valid credit for certain GPU instances which the demo project uses. You will need to verify your identity and provide a payment method following their instructions.
For both CircleCI and Pulumi, you will be using the free tier.
For CircleCI, make sure you are an admin in the organization you are using for the project. Configuring new runner namespaces will require admin access.
Note: This is an advanced tutorial and not aimed at beginners, and assumes a level of familiarity with CircleCI, CI/CD concepts, and Infrastructure as Code.
High-level project flow
As the pipeline starts, first provision your environment and create a new runner resource class in CircleCI. This is how CircleCI will communicate with your Scaleway infrastructure.
Then use Pulumi to provision two GPU instances on Scaleway:
-
- Hosts your CircleCI runner and acts as your CI/CD agent for training and deploying the model.
-
- Acts as a model server, to which you will deploy your trained models.
Note: In real-world production scenarios, you would likely not provision the model serving instance from the same pipeline, as you would need it to be permanent rather than ephemeral.
After everything is provisioned, install the required dependencies. Train, test, and deploy your model, executed on your newly provisioned CircleCI runner. These jobs are thoroughly covered in the CI/CD for ML blog post series, so we won’t go into detail here.
Finally, the resources are cleaned up, and the newly created CircleCI runner is removed so the pipeline can run again.
Walkthrough and project setup
We recommend that you fork the sample repository and continue from there. You can also clone it directly using this command:
git clone https://github.com/CIRCLECI-GWP/circleci-deploy-ml-scaleway.git
cd circleci-deploy-ml-scaleway
If you have cloned the repository, make sure to save and push to GitHub as you will need to connect it to CircleCI. This guide will show you how to get started, and then walk you through the files that comprise the pipeline.
Preparing environment variables
You will need a number of secrets and environment variables set up before the pipeline can be run. The secrets are split logically into four contexts:
-
- Create a new CircleCI API key and store it in a CircleCI context named
circleci-api
asCIRCLECI_CLI_TOKEN
. This provisions new runners from within the pipeline.
- Create a new CircleCI API key and store it in a CircleCI context named
-
- Create a new Pulumi access token and store it in a context named
pulumi
asPULUMI_ACCESS_TOKEN
.
- Create a new Pulumi access token and store it in a context named
-
- Create a Scaleway access key. You will need to generate two values: an access key ID and a secret key. Create a
scaleway
context and store them intoSCW_ACCESS_KEY
andSCW_SECRET_KEY
, respectively.
- Create a Scaleway access key. You will need to generate two values: an access key ID and a secret key. Create a
Finally, create a context ml-scaleway-demo
and populate it with the following environment variables::
DEPLOY_SERVER_USERNAME
(this tutorial usesdemo
)DEPLOY_SERVER_PASSWORD
(this tutorial usesdemodemo
)DEPLOY_SERVER_PATH
as/var/models
MODEL_SERVER_PUBIC_KEY
(your public SSH key, which you will use to access the model server instance)MODEL_SERVER_SSH_KEY
(your private SSH key, which you will use to access the model server instance)
Note: You can generate a new SSH key (using the ssh-keygen
command) or use an existing one. You will need to copy the public key and paste it into the Scaleway console later on.
Pulumi project and stack setup
Pulumi is the tool that helps you provision infrastructure. It offers an SDK-based approach to infrastructure provisioning for many different programming languages. This allows you to use Python, like in the rest of the AI/ML scripts.
In Pulumi you will need to create a new project and stack. Stack corresponds to individual applications in a Pulumi project. Your project is located in the org yemiwebby-org
, and is named cci-ml-runner
. It contains one stack, cci-runner-linux
.
Files for Pulumi are located in pulumi
. You may want to modify them with your preferred project and stack names, as well as your own Scaleway configuration.
Scaleway project setup
In Scaleway you created a new project named cci-ml-runner
. Go to Project Settings, copy the Project ID (it should be in the UUID format) and paste it into the file pulumi/Pulumi.cci-runner-linux.yaml
where you see scaleway:project_id:
Leave the rest of the file unchanged — you’ll need this specific region and zone combination to use the GPU resources.
config:
scaleway:project_id: YOUR_PROJECT_ID_UUID
scaleway:region: fr-par
scaleway:zone: fr-par-1
Make sure to pass in your SSH key to Scaleway using the SSH Keys section of the Scaleway console. This will allow you to SSH into the instances created by Pulumi later on.
Setting up the CircleCI pipeline
From the CircleCI dashboard, search for and select your project. Click Set Up.
You will be prompted to select a branch to run the pipeline from. Select the main
branch, then click Set Up Project to trigger the pipeline using the .circleci/config.yml
file in the repository. If all the prerequisites are met and the environment variables are set up correctly, the pipeline should start running.
Your pipeline has one workflow of interest: build-deploy
. It has 12 jobs, from provisioning the runner and infrastructure to the end.
To understand how the pipeline works, review the .circleci/config.yml
file in detail. This file defines the jobs, workflows, and executors used in the pipeline.
The first job is provision_runner
.
Provision runner
This job does most of the heavy lifting for provisioning cloud infrastructure and configuring the runner. It does it all in an automated way so that instances are truly ephemeral and consume only the resources required for the rest of the AI/ML work.
The job runs in a standard CircleCI Docker executor:
jobs:
provision_runner:
docker:
- image: cimg/python:3.11
First, check out the code from the repository and write the model server public key to the ~/.ssh/id_rsa_modelserver.pub
file. You will need this for SSH access later on in the tutorial.
- run:
name: Write model server public key
command: |
mkdir -p ~/.ssh
echo "$MODEL_SERVER_PUBLIC_KEY" > ~/.ssh/id_rsa_modelserver.pub
The CircleCI CLI will help you create new runner resource classes. The CLI uses the environment variable created earlier to authenticate.
Install the CircleCI CLI, which is used to interact with resource classes and tokens for self-hosted runners.
- run:
name: Install CircleCI CLI
command: |
# Make CircleCI CLI available at /usr/local/bin/circleci
curl -fLSs https://raw.githubusercontent.com/CircleCI-Public/circleci-cli/main/install.sh | sudo bash
You will also install Pulumi and the Scaleway Pulumi provider using pip
. This enables the provisioning logic later in the pipeline.
- run:
name: Install Pulumi & Scaleway SDK
command: |
python3 -m pip install pulumi pulumiverse_scaleway
Log into Pulumi using the CircleCI Pulumi orb:
- pulumi/login
Next, provision a new runner resource class and prepare the cloud-init
file required for Pulumi to configure the virtual machine:
- run:
name: Provision new runner and prepare cloud-init file
command: |
RESOURCE_CLASS="tutorial-gwp/scaleway-linux-${CIRCLE_WORKFLOW_ID}"
The RESOURCE_CLASS
variable is defined using the CIRCLE_WORKFLOW_ID
to ensure uniqueness across parallel or repeated workflows.
if circleci runner resource-class list tutorial-gwp --token "$CIRCLECI_CLI_TOKEN" | awk '{print $1}' | grep -Fxq "${RESOURCE_CLASS}"; then
echo " Resource class '${RESOURCE_CLASS}' already exists. Skipping creation."
else
echo "Creating resource class '${RESOURCE_CLASS}'..."
circleci runner resource-class create "${RESOURCE_CLASS}" \
"Autoprovisioned Linux runner on Scaleway"
fi
This checks whether the resource class already exists. If not, it creates one with a descriptive label.
runner_token_response=$(circleci runner token create "${RESOURCE_CLASS}" "${RESOURCE_CLASS##*/}" --token "$CIRCLECI_CLI_TOKEN")
runner_token=$(echo "$runner_token_response" | grep "token:" | awk '{print $2}')
Generate a new token for the runner resource class and extract the actual token value using awk.
cd pulumi
python3.11 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install pulumi pulumiverse_scaleway
Inside the pulumi
folder, create and activate a Python 3.11 virtual environment, then install all dependencies. This ensures Pulumi operations remain isolated and reproducible.
pulumi stack select yemiwebby-org/cci-ml-runner/cci-runner-linux
pulumi config set cci-ml-runner:circleciRunnerToken "$runner_token" --plaintext
Select the correct Pulumi stack and inject the runner token as a config variable.
sed "s/RUNNER_TOKEN/${runner_token}/g" runner_cloud_init_base.yml > runner_cloud_init.yml
Using sed
, replace the placeholder RUNNER_TOKEN
in the base cloud-init YAML file and write the output to a new file that Pulumi will use for provisioning.
Update the stack and apply all infrastructure changes:
- pulumi/update:
stack: yemiwebby-org/cci-ml-runner/cci-runner-linux
working_directory: pulumi
After provisioning, extract the model server’s IP and some environment variables into a .env file. Share them via CircleCI’s workspace system:
- run:
name: Store model server IP to workspace
command: |
mkdir -p workspace
echo "export DEPLOY_SERVER_HOSTNAME=$(pulumi stack output modelserver_ip --cwd pulumi --stack yemiwebby-org/cci-ml-runner/cci-runner-linux)" > workspace/.env
echo "export DEPLOY_SERVER_USERNAME=root" >> workspace/.env
echo "export DEPLOY_SERVER_PASSWORD=password" >> workspace/.env
echo "export DEPLOY_SERVER_PATH=/var/models" >> workspace/.env
- persist_to_workspace:
root: workspace
paths:
- .env
This ensures that any downstream jobs can load and use the IP address and log-in credentials generated during provisioning.
That’s it for the provision_runner
job. The whole job looks like this:
jobs:
provision_runner:
docker:
- image: cimg/python:3.11.9
steps:
- checkout
- run:
name: Write model server public key
command: |
mkdir -p ~/.ssh
echo "$MODEL_SERVER_PUBLIC_KEY" > ~/.ssh/id_rsa_modelserver.pub
- run:
name: Install CircleCI CLI
command: |
# Make CircleCI CLI available at /usr/local/bin/circleci
curl -fLSs https://raw.githubusercontent.com/CircleCI-Public/circleci-cli/main/install.sh | sudo bash
- run:
name: Install Pulumi & Scaleway SDK
command: |
python3 -m pip install pulumi pulumiverse_scaleway
- pulumi/login
- run:
name: Provision new runner and prepare cloud-init file
command: |
RESOURCE_CLASS="tutorial-gwp/scaleway-linux-${CIRCLE_WORKFLOW_ID}"
echo "Checking for existing resource class: ${RESOURCE_CLASS}"
if [ -z "$RESOURCE_CLASS" ]; then
echo " RESOURCE_CLASS is empty. Exiting."
exit 1
fi
if circleci runner resource-class list tutorial-gwp --token "$CIRCLECI_CLI_TOKEN" | awk '{print $1}' | grep -Fxq "${RESOURCE_CLASS}"; then
echo " Resource class '${RESOURCE_CLASS}' already exists. Skipping creation."
else
echo "Creating resource class '${RESOURCE_CLASS}'..."
circleci runner resource-class create "${RESOURCE_CLASS}" \
"Autoprovisioned Linux runner on Scaleway"
fi
echo "Generating new runner token..."
runner_token_response=$(circleci runner token create "${RESOURCE_CLASS}" "${RESOURCE_CLASS##*/}" --token "$CIRCLECI_CLI_TOKEN")
runner_token=$(echo "$runner_token_response" | grep "token:" | awk '{print $2}')
if [ -z "$runner_token" ]; then
echo "Failed to extract runner token. Exiting."
exit 1
fi
echo "Runner token created: ${#runner_token} characters long"
echo "Moving into Pulumi folder..."
cd pulumi
echo "Creating venv with Python 3.11 explicitly..."
python3.11 -m venv venv
source venv/bin/activate
echo "Installing Pulumi SDK and dependencies...."
pip install --upgrade pip
pip install pulumi pulumiverse_scaleway
echo "Selecting Pulumi stack..."
pulumi stack select yemiwebby-org/cci-ml-runner/cci-runner-linux
echo "Setting Pulumi config..."
pulumi config set cci-ml-runner:circleciRunnerToken "$runner_token" --plaintext
echo "Preparing cloud-init file..."
sed "s/RUNNER_TOKEN/${runner_token}/g" runner_cloud_init_base.yml > runner_cloud_init.yml
- pulumi/update:
stack: yemiwebby-org/cci-ml-runner/cci-runner-linux
working_directory: pulumi
- run:
name: Store model server IP to workspace
command: |
mkdir -p workspace
echo "export DEPLOY_SERVER_HOSTNAME=$(pulumi stack output modelserver_ip --cwd pulumi --stack yemiwebby-org/cci-ml-runner/cci-runner-linux)" > workspace/.env
echo "export DEPLOY_SERVER_USERNAME=root" >> workspace/.env
echo "export DEPLOY_SERVER_PASSWORD=password" >> workspace/.env
echo "export DEPLOY_SERVER_PATH=/var/models" >> workspace/.env
- persist_to_workspace:
root: workspace
paths:
- .env
Pulumi provisioning scripts and Scaleway GPU resource configuration
Pulumi scripts live in the pulumi
directory of the project. This is where you configure the resources. The files of note are:
Pulumi.yaml
: Project and language configuration for the SDK.Pulumi.cci-runner-linux.yaml
: Configuration specific to this project, such as Scaleway project ID, region, and zone. You populated that earlier with your project ID.requirements.txt
: Python’s dependency spec for Pulumi.__main__.py
: The main Pulumi script for declaring resource.runner_cloud_init_base.yml
: cloud-init template script for the CircleCI runner instance, executed at first boot to bootstrap it.modelserver_cloud_init.yml
:cloud-init
script for the model server instance.
Provisioning resources with Pulumi
__main__.py
provisions two Scaleway instances using the Server
class from the pulumiverse_scaleway
provider: one for the CircleCI runner (modelTrainingCCIRunner
) and the other for serving models (tensorflowServer
). Let’s walk through them starting with the runner:
modelTrainingCCIRunner = Server(
"runnerServerLinux",
zone=zone,
type="GP1-XS",
image="ubuntu_jammy",
ip_id=runner_ip.id,
root_volume={
"size_in_gb": 80,
"volume_type": "sbs_volume",
},
cloud_init=cloud_init_runner,
)
The instance is labeled runnerServerLinux
, which is how it will appear in the Scaleway dashboard. This is the machine that will run your CircleCI jobs.
- Type: The instance type is
GP1-XS
— a small GPU-based instance suitable for lightweight ML tasks. -
Image:
ubuntu_jammy
is a clean Ubuntu 22.04 image. -
IP Address:
runner_ip.id
, a reserved public IP, is attached. -
Volume: An 80 GB SSD volume is attached using Scaleway’s newer volume type,
sbs_volume
. - Cloud-init: The
cloud_init_runner
script is read from file and injected with the runner token generated earlier in the pipeline.
The cloud_init_runner
is prepared with the following snippet:
with open("runner_cloud_init_base.yml") as f:
cloud_init_runner = f.read().replace("RUNNER_TOKEN", runner_token)
cloud_init_runner = f"""#cloud-config
{cloud_init_runner}
"""
The base cloud-init
script is read and replaces a placeholder token before prepending the #cloud-config
header (required). This enables the runner VM to register itself with CircleCI automatically upon boot.
Next, define the CPU-based server for serving TensorFlow models:
tensorflowServer = Server(
"tensorflowServerLinux",
zone=zone,
type="DEV1-L",
image="ubuntu_jammy",
ip_id=server_ip.id,
root_volume={
"size_in_gb": 40,
"volume_type": "sbs_volume",
},
cloud_init=cloud_init_modelserver,
)
This uses a more affordable CPU-based instance type (DEV1-L
) and a smaller 40 GB SSD volume. This server will be used to serve models via Docker. Note that it’s possible to use a GPU instance here too, but Scaleway’s free plan allows only one GPU VM at a time, so this approach helps keep costs minimal for the tutorial.
The cloud_init_modelserver
script does a few things:
- Sets up a user named
demo
with SSH access and sudo rights. - Installs Docker and its dependencies.
- Sets up folder structure under
/var/models
for staging and production models.
This cloud-init is constructed as follows:
with open("modelserver_cloud_init.yml") as f:
cloud_init_modelserver = f.read()
with open(os.path.expanduser("~/.ssh/id_rsa_modelserver.pub")) as f:
public_key = f.read().strip()
cloud_init_modelserver = f"""#cloud-config
users:
- name: demo
...
ssh-authorized-keys:
- {public_key}
...
"""
Embed the local SSH public key into the authorized keys section of the cloud-init
to enable login access for the demo
user. This is useful for debugging or remote file management.
Finally, export key outputs for downstream use in the CircleCI workflow:
pulumi.export("cci_runner_ip", modelTrainingCCIRunner.public_ips)
pulumi.export("cci_runner_id", modelTrainingCCIRunner.id)
pulumi.export("modelserver_id", tensorflowServer.id)
pulumi.export("modelserver_ip", server_ip.address)
Note: modelTrainingCCIRunner.public_ips
is used here because the new pulumiverse_scaleway
provider returns a list of public IPs, while server_ip.address
is explicitly pulled from the reserved IP resource for the model server.
These outputs are later picked up in the pipeline and written to .env
for SSH and deployment purposes.
Cloud-init scripts for CircleCI runner server
Now, let’s look at the updated runner_cloud_init_base.yml
, which bootstraps the GPU-based CircleCI runner instance with Docker and the CircleCI Machine Runner:
#!/bin/sh
export runner_token="RUNNER_TOKEN"
echo "Runner token $runner_token"
# Install Docker first
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable"
sudo apt update
sudo apt install -y docker.io python3-pip python3-venv
sudo systemctl enable docker
sudo systemctl start docker
# Now install CircleCI runner
curl -s https://packagecloud.io/install/repositories/circleci/runner/script.deb.sh?any=true | sudo bash
sudo apt install -y circleci-runner
# Give the circleci user docker access
sudo usermod -aG docker circleci
# Inject your runner token
sudo sed -i "s/RUNNER_TOKEN/$runner_token/g" /etc/circleci-runner/circleci-runner-config.yaml
# Enable & start the runner service
sudo systemctl enable circleci-runner
sudo systemctl start circleci-runner
# Dump its status to the logs (non-blocking)
sudo systemctl status circleci-runner --no-pager || true
This shell script is executed automatically on first boot of the GPU instance to prepare it as a CircleCI Machine Runner.
Here’s what it does:
-
Sets the runner token The
RUNNER_TOKEN
placeholder is dynamically injected from the Pulumi stack and exported to be used throughout the script. -
Installs Docker and dependencies Before installing the CircleCI runner, the script installs Docker and essential Python tooling (
python3-pip
,python3-venv
). It also enables and starts the Docker service. -
Installs the CircleCI runner The script pulls the CircleCI runner APT source, then installs the
circleci-runner
package. -
Grants Docker access The
circleci
user is added to thedocker
group to allow jobs to run Docker commands if needed. -
Injects the runner token The runner token is inserted into the CircleCI config file located at
/etc/circleci-runner/circleci-runner-config.yaml
. -
Enables and starts the runner service The runner service is enabled and started via
systemctl
, making the instance immediately available to CircleCI pipelines. -
Prints runner status to logs (optional) Finally, the script logs the runner status for debugging purposes. The use of
|| true
ensures the script doesn’t fail even if the status command exits non-zero.
Note: Keep the systemctl start circleci-runner
instruction as the last step. Starting the runner early in the script might expose an incomplete setup to CircleCI, leading to flaky job behavior or missing dependencies.
This cloud-init
file ensures the CircleCI runner is properly installed, secure, and Docker-ready, making the instance fully operational as soon as it boots.
Model server cloud-init script
Now, let’s look at the other cloud-init
script: modelserver_cloud_init.yml
.
This script installs Docker Engine, sets up the tensorflow-serving
Docker container, and prepares your environment for pushing and deploying new models.
This also creates a new SSH user demo
, which will be used by the pipeline to push models to the server.
The installation for Docker Engine is taken from this article on Scaleway’s tutorials site, if you want more details.
To prepare the serving and upload directories you need to first create /var/models
and set ownership to the docker
group:
# Create the directories and grant permissions so that the user defined in the .env file and docker can read/write to them
sudo mkdir -p /var/models/staging # so that docker will have something to bind to, it will be populated later
sudo mkdir -p /var/models/prod
sudo chown -R $USER:docker /var/models
sudo chmod -R 775 /var/models
Next, create the demo user that will be used for uploading models from the pipeline:
# Create demo user for SFTP upload if not exists
if ! id demo &>/dev/null; then
useradd -m -s /bin/bash demo
fi
usermod -aG docker demo
Finally, download the tensorflow_serving
image and run the container:
# Download the TensorFlow Serving Docker image and repo
docker pull tensorflow/serving
# Run TensorFlow Serving container (if not already running)
if ! docker ps -a --format '{{.Names}}' | grep -w tensorflow_serving; then
docker run -d --name tensorflow_serving -p 8501:8501 \
-v /var/models/prod:/models/my_model \
-e MODEL_NAME=my_model tensorflow/serving
fi
Your model server is ready for the newly trained models to be uploaded.
Using the new runner resource class to run AI/ML workloads in a CircleCI pipeline
After provisioning of cloud infrastructure and runner resources is complete, you can move on to the “fun stuff”. The subsequent jobs — install-build
, train
, test
, package
, deploy
, and test-deployment
— are based on a previous blog post. You can review that tutorial for more detail. This tutorial covers the differences between the two repositories that are specific to building on Scaleway’s GPU infrastructure.
In short, for each of the jobs, there is a corresponding Python script in the ml/
directory which has scripted the tasks for that segment of the pipeline: building the dataset, training, testing, and so on. Then, they use the CircleCI workspace to pass the intermediary artifacts between jobs in the pipeline.
In all the jobs, you will use the newly created executor default_linux_machine
. For ease of reuse, it is declared it in the config.yml
:
executors:
default_linux_machine:
machine:
image: ubuntu-2204:current
All the jobs will declare the executor, which will then run it on your Scaleway instance.
install-build:
executor: default_linux_machine
steps: …
When the model has been trained and tested, it’s time to package it and deploy. You will use your model server instance for that, passing the details to the newly created instance’s IP via workspace.
You have already used the workspace to store the .env
file containing DEPLOY_SERVER_HOSTNAME
. You now need to retrieve it in the package
job.
The populate-env
command grabs it from the workspace, and makes it available to other commands in the job as an environment variable using $BASH_ENV
file:
populate-env:
steps:
- attach_workspace:
at: .
- run:
name: Restore secrets from workspace and add to environment vars
# Environment variables must be configured in a CircleCI project or context
command: |
cat .env >> $BASH_ENV
source $BASH_ENV
The ml/4_package.py
script then takes the created model and uses SFTP to upload it to the model server’s staging
directory.
This is performed again in deploy
and test-deployment
jobs, which also need access to the same IP address for their work.
Now, review the orchestrated build-deploy
workflow with all the jobs:
workflows:
# This workflow does a full build from scratch and deploys the model
build-deploy:
jobs:
- provision_runner:
context:
- pulumi
- scaleway
- circleci-api
- install-build:
requires:
- provision_runner
context:
- ml-scaleway-demo
- train:
requires:
- install-build
- test:
requires:
- train
- package:
requires:
- test
context:
- ml-scaleway-demo
# Do not deploy without manual approval - you can inspect the console output from training and make sure you are happy to deploy
- deploy:
requires:
- package
context:
- ml-scaleway-demo
- test-deployment:
requires:
- deploy
context:
- ml-scaleway-demo
- approve_destroy:
type: approval
requires:
- test-deployment
- destroy_runner:
context:
- pulumi
- scaleway
- circleci-api
requires:
- approve_destroy
The workflow definition orchestrates all the jobs in your pipeline, depending on what the conditions for execution are. You can also pass in the contexts with the right environment variables.
For example, your provision_runner
job needs access to multiple environment variables to access Pulumi, Scaleway, and the CircleCI API keys, so you pass all three to it.
jobs:
- provision_runner:
context:
- pulumi
- scaleway
- circleci-api
Similarly, the deploy
job needs access to the deployment server credentials and path, which are stored in the ml-scaleway-demo
context:
- deploy:
requires:
- package
context:
- ml-scaleway-demo
The final jobs in the workflow are approve_destroy
, which introduces a manual approval before you clean up your created infrastructure in destroy_runner
:
- approve_destroy:
type: approval
requires:
- test-deployment
- destroy_runner:
context:
- pulumi
- scaleway
- circleci-api
requires:
- approve_destroy
Destroy runner and clean up the environment
The destroy_runner
job is responsible for tearing down the infrastructure provisioned earlier in the pipeline, specifically the CircleCI self-hosted runner and associated resources (VM, public IP, and volumes). This is important for keeping costs low and ensuring that resources are ephemeral after use.
The job runs in a Docker environment using Python 3.11.9:
jobs:
destroy_runner:
docker:
- image: cimg/python:3.11.9
Check out the project repository:
- checkout
Next, install the CircleCI CLI so you can interact with CircleCI resource classes or tokens if needed (not strictly used here but kept for consistency and future extensibility):
- run:
name: Install CircleCI CLI
command: |
curl -fLSs https://raw.githubusercontent.com/CircleCI-Public/circleci-cli/main/install.sh | sudo bash
Pulumi is then authenticated using the provided API token via the pulumi/login
command. This allows the job to run Pulumi operations securely:
- pulumi/login
Finally, destroy the Pulumi stack associated with your infrastructure:
- pulumi/destroy:
stack: yemiwebby-org/cci-ml-runner/cci-runner-linux
working_directory: pulumi
This command runs pulumi destroy
on the specified stack inside the pulumi
folder. It deletes all cloud resources (e.g., Scaleway servers, IPs, volumes) defined in the __main__.py
file, effectively cleaning up the environment.
Conclusion
This wraps up your tutorial. We covered the intricacies of using CircleCI runners to execute AI/ML workloads on your own infrastructure. In your example we used Scaleway cloud and its optimized GPU compute instances, but the principles covered can be applied to any type of infrastructure. We also covered utilizing the flexibility that the cloud offers to use resources only when needed. That is common practice for a CI/CD pipeline with a clearly defined beginning and end, but it is not as common with more traditional infrastructure. By leveraging as-needed cloud resources, CircleCI can help you manage your AI/ML workflows more efficiently.
CircleCI is the leading CI/CD platform for managing and automating AI workflows. It takes minutes to set up, and you can evaluate it at no cost, with up to 6,000 minutes of free usage per month. Sign up for your free account to get started today.