1. circleci/evals@2.0.1

circleci/evals@2.0.1

Certified
Sections
Orb for running assertions on LLM evaluation results. Email ai-feedback@circleci.com for feedback.
Created: April 30, 2024Version Published: November 21, 2024Releases: 10
Org Usage:
< 25

Orb Quick Start Guide

Use CircleCI version 2.1 at the top of your .circleci/config.yml file.

1 version: 2.1

Add the orbs stanza below your version, invoking the orb:

1 2 orbs: evals: circleci/evals@2.0.1

Use evals elements in your existing workflows and jobs.

Usage Examples

run_evals_orb_test_command

Run assertions using the evals orb

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 version: '2.1' orbs: evals: circleci/evals@2.0 jobs: evals-test-assertions-job: docker: - image: cimg/base:current-22.04 steps: - checkout - evals/test: assertions: assertions.json metrics: eval_results.json results: test_results.xml workflows: test-eval-workflow: jobs: - evals-test-assertions-job

Jobs

Commands

test

This command runs assertions on evaluation metrics, saves results in JUnit XML format, and makes them available in CircleCI's Tests tab.

Show command Source
PARAMETER
DESCRIPTION
REQUIRED
DEFAULT
TYPE
assertions
path to the JSON assertions file
Yes
-
string
metrics
path to the JSON evaluation metrics file
Yes
-
string
results
path to store the JUnit XML results file
No
test_results.xml
string

Executors

Orb Source

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 # This code is licensed from CircleCI to the user under the MIT license. # See here for details: https://circleci.com/developer/orbs/licensing version: 2.1 description: | Orb for running assertions on LLM evaluation results. Email ai-feedback@circleci.com for feedback. commands: test: description: This command runs assertions on evaluation metrics, saves results in JUnit XML format, and makes them available in CircleCI's Tests tab. parameters: assertions: description: path to the JSON assertions file type: string metrics: description: path to the JSON evaluation metrics file type: string results: default: test_results.xml description: path to store the JUnit XML results file type: string steps: - run: command: | #!/usr/bin/env bash # Hi! Based off the slack-orb-go main.sh. If you plan to have osx & windows # support go look at that file and bring it back in. # Determine the http client to use # Returns 1 if no HTTP client is found determine_http_client() { if command -v curl >/dev/null 2>&1; then HTTP_CLIENT=curl elif command -v wget >/dev/null 2>&1; then HTTP_CLIENT=wget else return 1 fi } # Download a executable file # $1: The path to save the file to # $2: The URL to download the file from # $3: The HTTP client to use (curl or wget) download_executable() { if [ "$3" = "curl" ]; then set -x curl --fail --retry 3 -L -o "$1" "$2" set +x elif [ "$3" = "wget" ]; then set -x wget --tries=3 --timeout=10 --quiet -O "$1" "$2" set +x else return 1 fi } detect_os() { # TODO: does uname work on windows? detected_platform="$(uname -s | tr '[:upper:]' '[:lower:]')" case "$detected_platform" in linux*) PLATFORM=linux ;; darwin*) PLATFORM=darwin ;; msys* | cygwin*) PLATFORM=windows ;; *) return 1 ;; esac } detect_arch() { detected_arch="$(uname -m)" case "$detected_arch" in x86_64 | amd64) ARCH=x86_64 ;; # i386 | i486 | i586 | i686) ARCH=386 ;; arm64 | aarch64) ARCH=arm64 ;; # arm*) ARCH=arm ;; *) return 1 ;; esac } # Confirm we have unzip available # Returns 1 if unzip not found detect_unzip() { if command -v unzip >/dev/null 2>&1; then return 0 fi return 1 } # Confirm we have tar available # Returns 1 if tar not found detect_tar() { if command -v tar >/dev/null 2>&1; then return 0 fi return 1 } # Print a warning message # $1: The warning message to print print_warn() { yellow="\033[1;33m" normal="\033[0m" printf "${yellow}%s${normal}\n" "$1" } # Print a success message # $1: The success message to print print_success() { green="\033[0;32m" normal="\033[0m" printf "${green}%s${normal}\n" "$1" } # Print an error message # $1: The error message to print print_error() { red="\033[0;31m" normal="\033[0m" printf "${red}%s${normal}\n" "$1" } print_warn "Thank you for trying it out and please provide feedback to us at CircleCI Orbs discussion forum: https://discuss.circleci.com/c/orbs" if command -v evals >/dev/null 2>&1; then print_success "Evals cli is already installed." exit 0 fi if ! detect_os; then print_error "Unsupported operating system: $(uname -s)." exit 1 fi printf '%s\n' "Operating system: $PLATFORM." if ! detect_arch; then print_error "Unsupported architecture: $(uname -m)." exit 1 fi printf '%s\n' "Architecture: $ARCH." if [ "$PLATFORM" = "windows" ]; then if ! detect_unzip; then print_error "Unzip is required to download the Evals Orb executable." exit 1 fi else if ! detect_tar; then print_error "Tar is required to download the Evals Orb executable." exit 1 fi fi # WARN: Do not touch this line. # this will be replaced with the actual version of the evals cli in CI # as part of orb-inject-cli-version job which is run on release filters in CCI workflow version="2.0.1" s3_host="https://circleci-binary-releases.s3.amazonaws.com" working_directory="$(printf "%s" "$CIRCLE_WORKING_DIRECTORY" | sed "s|~|$HOME|")" orb_cache_dir="${working_directory}/.circleci/orbs/cache" orb_bin_dir="${working_directory}/.circleci/orbs/bin" archive_file_name="evals_${version}_${PLATFORM}_${ARCH}.tar.gz" executable_checksums_name="evals_${version}_checksums.txt" executable_checksums_path="evals/${version}/${executable_checksums_name}" archive_file_path="${orb_cache_dir}/${archive_file_name}" # ------------------------------ Download executable ------------------------------ if [ ! -f "$archive_file_path" ]; then mkdir -p "$orb_bin_dir" mkdir -p "$orb_cache_dir" if ! determine_http_client; then printf '%s\n' "cURL or wget is required to download the required executable." printf '%s\n' "Please install cURL or wget and try again." exit 1 fi printf '%s\n' "HTTP client: $HTTP_CLIENT." archive_url="${s3_host}/evals/${version}/${archive_file_name}" printf '%s\n' "Release URL: $archive_url." if ! download_executable "$archive_file_path" "$archive_url" "$HTTP_CLIENT"; then printf '%s\n' "Failed to download evals executable." exit 1 fi printf '%s\n' "Downloaded evals executable to $orb_bin_dir" else printf '%s\n' "Skipping executable download since it already exists at $archive_file_path." fi # ------------------------------ Download checksums ------------------------------ executable_checksums_file_url="${s3_host}/${executable_checksums_path}" if ! download_executable "$orb_cache_dir/$executable_checksums_name" "$executable_checksums_file_url" "$HTTP_CLIENT"; then printf '%s\n' "Failed to download checksum file." exit 1 fi # ------------------------------ Validate executable ------------------------------ expected_checksum=$(grep "$archive_file_name" "$orb_cache_dir/$executable_checksums_name" | awk '{print $1}') if [ -n "$expected_checksum" ]; then actual_sha256="" if [ "$PLATFORM" = "darwin" ]; then actual_sha256=$(shasum -a 256 "$archive_file_path" | cut -d' ' -f1) else actual_sha256=$(sha256sum "$archive_file_path" | cut -d' ' -f1) fi if [ "$actual_sha256" != "$expected_checksum" ]; then print_error "SHA256 checksum does not match. Expected $expected_checksum but got $actual_sha256" exit 1 else print_success "SHA256 checksum matches. executable is valid." fi else print_warn "SHA256 checksum not found. Skipping checksum validation." fi # ------------------------------ Extract executable ------------------------------ printf '%s\n' "Untar ${archive_file_path}..." if ! tar -xf "${archive_file_path}" -C "$orb_bin_dir"; then print_error "Failed to untar $archive_file_path." exit 1 fi # ------------------------------ Move executable to PATH ------------------------------ # Where to move the executable path_destination="$HOME/bin" printf '%s\n' "Moving evals executable to PATH" mkdir -p "$path_destination" # shellcheck disable=SC2016 # shellcheck disable=SC2086 echo 'export PATH='${path_destination}':$PATH' >> "$BASH_ENV" # shellcheck disable=SC1090 # shellcheck disable=SC3046 source "$BASH_ENV" if ! mv "$orb_bin_dir/evals" "${path_destination}/evals"; then print_error "Failed to move $orb_bin_dir/evals executable executable." exit 1 fi print_success "Successfully installed evals cli." name: Download eval binary - run: command: evals test --metrics << parameters.metrics >> --assertions << parameters.assertions >> --results << parameters.results >> name: Run test - store_test_results: path: << parameters.results >> - store_artifacts: path: << parameters.results >> executors: {} jobs: {} examples: run_evals_orb_test_command: description: | Run assertions using the evals orb usage: version: "2.1" orbs: evals: circleci/evals@2.0 jobs: evals-test-assertions-job: docker: - image: cimg/base:current-22.04 steps: - checkout - evals/test: assertions: assertions.json metrics: eval_results.json results: test_results.xml workflows: test-eval-workflow: jobs: - evals-test-assertions-job
Developer Updates
Get tips to optimize your builds
Or join our research panel and give feedback
By submitting this form, you are agreeing to ourTerms of UseandPrivacy Policy.