Container runner performance benchmarks
Runner benchmarks show the performance tradeoffs of CircleCI self-hosted runners up to a non-acceptable error threshold. From the chart below, you can see there is a trade off between the following:
-
ReplicaSet
-
Concurrency
-
Tasks
-
Queue
-
Run time
Depending on your team’s workload types, for example, high parallelism, fan-in/out etc. you may need to adjust your cluster for high concurrency and tasks, potentially impacting queuing, run time, and other factors.
By publishing our benchmarks we can make measurable improvements to the performance and scale of CircleCI self-hosted runner, and show the impact of those improvements.
These benchmarks used a GKE (Google) cluster with 5 dedicated E2-medium nodes. The cos_containerd image was used with GKE version 1.29.4 and no autoscaling.
|
The tables below detail the aggregation of results from testing the 3.1 self-hosted runner (GOAT) compared to the same tests run with the 3.0 self-hosted runner. Version 3.1 introduced a major re-architecture of container runner to address performance, stability, and reliability. For more technical background on 3.1, refer to the runner-init project’s README.
Compared with the 3.0 self-hosted runner, GOAT shows a net improvement across all four categories. The most notable improvement is the reduction in queue times. GOAT also demonstrated a minor but notable decrease in failed runs, showing an improvement when run under stress.
GOAT benchmarks
| Failure Rate | Average Run Time | Max Run Time | Average Queue Time | Max Queue Time | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
Total Tasks |
Max Concurrency |
Replica Count |
Failure Rate (GOAT) |
Failure Rate (3.0) |
Average Run Time (GOAT) |
Improvement From 3.0 |
Max Run Time (GOAT) |
Improvement From 3.0 |
Average Queue Time (GOAT) |
Improvement From 3.0 |
Max Queue Time (GOAT) |
Improvement From 3.0 |
20 |
20 |
1 |
0.0% |
0.0% |
934.8 |
10% |
11196.8 |
-350% |
5173.3 |
50% |
12792.5 |
50% |
20 |
20 |
2 |
20.0% |
0.0% |
955.6 |
-10% |
2602.0 |
-10% |
4074.2 |
50% |
10249.9 |
50% |
20 |
20 |
3 |
30.0% |
0.0% |
914.0 |
0% |
2546.8 |
-10% |
4037.3 |
50% |
9850.8 |
50% |
20 |
40 |
1 |
10.0% |
0.6% |
1046.7 |
-20% |
3784.8 |
20% |
4926.4 |
60% |
12303.0 |
50% |
20 |
40 |
2 |
20.0% |
0.0% |
899.4 |
20% |
9896.0 |
-260% |
4546.5 |
50% |
10972.2 |
50% |
20 |
40 |
3 |
30.0% |
0.0% |
934.3 |
-10% |
2285.8 |
10% |
3914.5 |
50% |
9070.2 |
50% |
20 |
80 |
1 |
10.0% |
0.0% |
930.0 |
-20% |
3156.4 |
-40% |
4358.7 |
60% |
11504.0 |
50% |
20 |
80 |
2 |
20.0% |
0.0% |
892.3 |
0% |
2864.5 |
0% |
4164.0 |
50% |
10028.6 |
40% |
20 |
80 |
3 |
30.0% |
0.0% |
975.5 |
-10% |
2551.2 |
10% |
3862.6 |
50% |
9389.4 |
50% |
40 |
20 |
1 |
10.0% |
0.0% |
857.0 |
0% |
3685.0 |
-20% |
8345.5 |
50% |
28749.8 |
40% |
40 |
20 |
2 |
20.0% |
0.6% |
1332.7 |
-10% |
4023.9 |
10% |
5199.1 |
70% |
14260.5 |
60% |
40 |
20 |
3 |
30.0% |
0.9% |
1415.6 |
-30% |
4421.7 |
0% |
5571.4 |
60% |
15838.6 |
50% |
40 |
40 |
1 |
10.0% |
0.3% |
1875.5 |
-50% |
9882.1 |
30% |
7290.1 |
60% |
22519.5 |
50% |
40 |
40 |
2 |
20.0% |
0.3% |
1088.4 |
-10% |
3786.4 |
-10% |
5639.7 |
60% |
16400.2 |
60% |
40 |
40 |
3 |
30.0% |
0.0% |
1357.3 |
-10% |
4259.0 |
0% |
5657.6 |
60% |
15374.1 |
50% |
40 |
80 |
1 |
10.0% |
0.0% |
910.6 |
0% |
2673.0 |
10% |
7457.9 |
60% |
23214.5 |
50% |
40 |
80 |
2 |
20.0% |
0.6% |
1393.7 |
-10% |
5733.2 |
30% |
5582.0 |
60% |
16533.2 |
60% |
40 |
80 |
3 |
30.0% |
0.0% |
1347.5 |
-20% |
3954.9 |
0% |
6279.2 |
50% |
15712.4 |
50% |
80 |
20 |
1 |
10.0% |
0.1% |
855.0 |
10% |
2718.1 |
20% |
16916.3 |
40% |
64214.3 |
30% |
80 |
20 |
2 |
20.0% |
0.3% |
1259.3 |
30% |
4594.2 |
60% |
9396.9 |
60% |
32960.3 |
50% |
80 |
20 |
3 |
30.0% |
0.4% |
2123.9 |
0% |
9011.4 |
10% |
8371.7 |
60% |
28033.0 |
60% |
80 |
40 |
1 |
10.0% |
0.0% |
1651.7 |
-60% |
10064.1 |
-60% |
15135.1 |
50% |
59203.1 |
30% |
80 |
40 |
2 |
20.0% |
0.3% |
1960.8 |
20% |
8564.9 |
10% |
9610.8 |
60% |
30498.3 |
50% |
80 |
40 |
3 |
30.0% |
0.3% |
2781.8 |
-30% |
9568.8 |
10% |
9575.0 |
60% |
28208.7 |
50% |
80 |
80 |
1 |
10.0% |
0.1% |
959.0 |
50% |
2922.2 |
80% |
13394.6 |
60% |
44709.2 |
50% |
80 |
80 |
2 |
20.0% |
0.3% |
1519.4 |
10% |
5510.7 |
20% |
9007.4 |
60% |
28765.2 |
60% |
80 |
80 |
3 |
30.0% |
0.1% |
2344.5 |
10% |
9807.7 |
40% |
8877.2 |
60% |
30428.8 |
60% |
128 |
20 |
1 |
10.0% |
0.1% |
896.8 |
10% |
4684.8 |
0% |
27022.6 |
30% |
106310.2 |
30% |
128 |
20 |
2 |
20.0% |
0.6% |
1098.0 |
40% |
5088.5 |
70% |
14267.1 |
50% |
54209.6 |
50% |
128 |
20 |
3 |
30.0% |
0.5% |
1892.9 |
30% |
8988.9 |
40% |
12016.8 |
60% |
44082.4 |
60% |
128 |
40 |
1 |
10.0% |
0.1% |
1449.0 |
30% |
10143.9 |
40% |
25313.2 |
40% |
105958.6 |
20% |
128 |
40 |
2 |
20.0% |
0.2% |
1520.5 |
50% |
9324.9 |
50% |
13568.4 |
60% |
51927.7 |
50% |
128 |
40 |
3 |
30.0% |
0.1% |
4103.5 |
40% |
18628.1 |
40% |
13484.4 |
70% |
48213.4 |
50% |
128 |
80 |
1 |
10.1% |
0.3% |
956.5 |
50% |
3444.1 |
70% |
23486.7 |
50% |
93787.0 |
30% |
128 |
80 |
2 |
20.0% |
0.3% |
1670.3 |
60% |
8305.1 |
70% |
12316.0 |
70% |
44018.8 |
60% |
128 |
80 |
3 |
30.0% |
0.9% |
3914.9 |
20% |
19014.0 |
30% |
15326.8 |
60% |
63039.7 |
60% |
Version 3 benchmarks
| Total Tasks | Max Concurrency | Replica Count | Node Count | Failure Rate | Avg Run time | Avg Queue Time | Max Queue Time | Max Run Time |
|---|---|---|---|---|---|---|---|---|
128 |
80 |
3 |
5 |
0.000000 |
3855 |
76667 |
103048 |
11022 |
80 |
80 |
3 |
5 |
0.012500 |
3951 |
45000 |
60557 |
8556 |
40 |
80 |
3 |
5 |
0.000000 |
2386 |
24865 |
32445 |
10187 |
20 |
80 |
3 |
5 |
0.000000 |
1939 |
18014 |
23248 |
3095 |
128 |
40 |
3 |
5 |
0.007812 |
5089 |
90771 |
117578 |
19652 |
80 |
40 |
3 |
5 |
0.000000 |
2886 |
56460 |
69849 |
7609 |
40 |
40 |
3 |
5 |
0.000000 |
2146 |
26668 |
35319 |
3508 |
20 |
40 |
3 |
5 |
0.000000 |
2038 |
19586 |
24868 |
3014 |
128 |
20 |
3 |
5 |
0.000000 |
6413 |
70101 |
109269 |
31100 |
80 |
20 |
3 |
5 |
0.000000 |
3078 |
51401 |
72506 |
6939 |
40 |
20 |
3 |
5 |
0.000000 |
2127 |
31081 |
36791 |
3623 |
20 |
20 |
3 |
5 |
0.000000 |
2205 |
16902 |
19836 |
3304 |
128 |
80 |
2 |
5 |
0.007812 |
2848 |
78955 |
111321 |
5731 |
80 |
80 |
2 |
5 |
0.000000 |
2246 |
56652 |
87118 |
5992 |
20 |
80 |
2 |
5 |
0.000000 |
1721 |
17674 |
23279 |
2259 |
40 |
80 |
2 |
5 |
0.000000 |
2135 |
29990 |
36930 |
3248 |
128 |
40 |
2 |
5 |
0.007812 |
2532 |
72492 |
108279 |
6756 |
80 |
40 |
2 |
5 |
0.000000 |
3620 |
56225 |
75590 |
9391 |
40 |
40 |
2 |
5 |
0.000000 |
2048 |
24523 |
33774 |
3154 |
20 |
40 |
2 |
5 |
0.000000 |
1927 |
15072 |
18269 |
2732 |
128 |
20 |
2 |
5 |
0.000000 |
2325 |
62237 |
107474 |
5076 |
80 |
20 |
2 |
5 |
0.000000 |
2553 |
42657 |
67140 |
5982 |
40 |
20 |
2 |
5 |
0.000000 |
2235 |
28932 |
36972 |
3601 |
20 |
20 |
2 |
5 |
0.000000 |
1957 |
16123 |
22835 |
2974 |
128 |
80 |
1 |
5 |
0.000000 |
2105 |
113833 |
190044 |
5106 |
80 |
80 |
1 |
5 |
0.000000 |
2497 |
82633 |
135382 |
6952 |
40 |
80 |
1 |
5 |
0.000000 |
2092 |
37600 |
65750 |
3630 |
20 |
80 |
1 |
5 |
0.000000 |
1842 |
19383 |
24808 |
3004 |
128 |
40 |
1 |
5 |
0.000000 |
2049 |
109442 |
207049 |
5524 |
80 |
40 |
1 |
5 |
0.000000 |
1932 |
73936 |
135250 |
3757 |
40 |
40 |
1 |
5 |
0.000000 |
1937 |
40138 |
51027 |
3343 |
20 |
40 |
1 |
5 |
0.000000 |
1802 |
17303 |
22432 |
2592 |
128 |
20 |
1 |
5 |
0.000000 |
1809 |
107782 |
207405 |
3281 |
80 |
20 |
1 |
5 |
0.000000 |
1755 |
66260 |
126222 |
2863 |
40 |
20 |
1 |
5 |
0.000000 |
1786 |
35307 |
60009 |
2738 |
20 |
20 |
1 |
5 |
0.000000 |
2092 |
23581 |
30639 |
2662 |
Average |
2499 |
48785 |
74731 |
5943 |
||||
Minimum |
1721 |
15072 |
18269 |
2259 |
||||
Max |
6413 |
113833 |
207405 |
31100 |
In summary, the average improvements of GOAT are as follows:
| Average Run Time | Max Run Time | Average Queue Time | Max Queue Time |
|---|---|---|---|
5% |
1% |
56% |
49% |
In some instances, GOAT showed lower performance than the 3.0 self-hosted runner. In these cases, the differences are on the order of milliseconds and can often be attributed to cluster, network, and compute conditions. While some differences may appear extreme, they are often outliers in the 95th (or higher) percentile. The table above is the result of repeating the experiment four times for each row. When these extremes are considered in the context of the rest of the experiments, the net result is still positive for run times.
In queuing, where the most dramatic performance increase is observed, the results are much more consistent and are less influenced by external factors such as remote API calls.
Runner configuration recommendations
These recommendations use the reference architecture of GKE 1.29.4 with a node pool of 5 E2 medium nodes. They build on the benchmarks above for container runner cluster configuration:
-
Replica count of the container agent.
-
Maximum concurrent task configuration.
High performance cluster
-
3 replicas of container agent.
-
80 concurrent tasks per replica.
This configuration makes a slight trade off in stability, a slightly higher rate of infrastructure failures, to achieve much higher task throughput and to reduce queueing times.
High stability cluster
-
1 replica of container agent.
-
20 concurrent tasks per replica.
This configuration trades off throughput for higher stability, with minimal infrastructure failures. Note this is the default configuration for the container agent Helm chart.
When tuning a cluster for performance there are three main variables to consider: container agent replica count, maximum concurrent tasks per replica, and node pool configuration.
Container agent replica count
The more replicas of container agent, the faster tasks will get claimed, as each replica runs its own collection of claiming loops. More replicas help when you have sudden large backlogs of tasks to run. Tasks can be claimed more quickly and have a pod spec submitted to the Kubernetes cluster for scheduling. More replicas and concurrent tasks increase strain on the K8s control plane. This also makes you more prone to task start failures. CircleCI container runners will attempt to reschedule a task up to three times before declaring an infrastructure failure.
Maximum concurrent tasks per replica
This number in particular is very sensitive to node types and counts. The more tasks that launch in a short window, the higher the strain on the Kubernetes cluster’s control plane. Individual kubelets, which manage pods and containers on a specific node, also experience increased strain. As node power and count increase, the impact of concurrent tasks on a cluster decreases. The lower the number of maximum concurrent tasks, the greater the reliability of tasks successfully starting and not experiencing an infrastructure failure.
The likelihood of an infrastructure failure for a task decreases as node count and resources are increased, particularly CPU.
Node types and count
The recommendations already presented are based on the reference cluster configuration. As a node pool grows, or is set to an instance type with greater resources, task execution becomes more reliable. When sizing a cluster, add headspace beyond that expected for an individual task. The kubelet and container driver share the same resources as the pods on the node. The more resource starved they become, the more prone to long queue times and infrastructure failures tasks become. The more distributed pods are able to be scheduled, the less pressure and backlog are applied to the individual kubelets and container engines, resulting in shorter queueing times.
Troubleshooting
Refer to the Troubleshoot Container Runner section of the Troubleshoot Self-hosted Runner guide if you encounter issues installing or using container runner.