cmd+/

Container runner performance benchmarks

Cloud Server v4.3+

Runner benchmarks show the performance tradeoffs of CircleCI self-hosted runners up to a non-acceptable error threshold. From the chart below, you can see there is a trade off between the following:

ReplicaSet
Concurrency
Tasks
Queue
Run time

Depending on your team’s workload types, for example, high parallelism, fan-in/out etc. you may need to adjust your cluster for high concurrency and tasks, potentially impacting queuing, run time, and other factors.

By publishing our benchmarks we can make measurable improvements to the performance and scale of CircleCI self-hosted runner, and show the impact of those improvements.

These benchmarks used a GKE (Google) cluster with 5 dedicated E2-medium nodes. The cos_containerd image was used with GKE version 1.29.4 and no autoscaling.

The tables below detail the aggregation of results from testing the 3.1 self-hosted runner (GOAT) compared to the same tests run with the 3.0 self-hosted runner. Version 3.1 introduced a major re-architecture of container runner to address performance, stability, and reliability. For more technical background on 3.1, refer to the runner-init project’s README.

Compared with the 3.0 self-hosted runner, GOAT shows a net improvement across all four categories. The most notable improvement is the reduction in queue times. GOAT also demonstrated a minor but notable decrease in failed runs, showing an improvement when run under stress.

GOAT benchmarks

			Failure Rate		Average Run Time		Max Run Time		Average Queue Time		Max Queue Time
Total Tasks	Max Concurrency	Replica Count	Failure Rate (GOAT)	Failure Rate (3.0)	Average Run Time (GOAT)	Improvement From 3.0	Max Run Time (GOAT)	Improvement From 3.0	Average Queue Time (GOAT)	Improvement From 3.0	Max Queue Time (GOAT)	Improvement From 3.0
20	20	1	0.0%	0.0%	934.8	10%	11196.8	-350%	5173.3	50%	12792.5	50%
20	20	2	20.0%	0.0%	955.6	-10%	2602.0	-10%	4074.2	50%	10249.9	50%
20	20	3	30.0%	0.0%	914.0	0%	2546.8	-10%	4037.3	50%	9850.8	50%
20	40	1	10.0%	0.6%	1046.7	-20%	3784.8	20%	4926.4	60%	12303.0	50%
20	40	2	20.0%	0.0%	899.4	20%	9896.0	-260%	4546.5	50%	10972.2	50%
20	40	3	30.0%	0.0%	934.3	-10%	2285.8	10%	3914.5	50%	9070.2	50%
20	80	1	10.0%	0.0%	930.0	-20%	3156.4	-40%	4358.7	60%	11504.0	50%
20	80	2	20.0%	0.0%	892.3	0%	2864.5	0%	4164.0	50%	10028.6	40%
20	80	3	30.0%	0.0%	975.5	-10%	2551.2	10%	3862.6	50%	9389.4	50%
40	20	1	10.0%	0.0%	857.0	0%	3685.0	-20%	8345.5	50%	28749.8	40%
40	20	2	20.0%	0.6%	1332.7	-10%	4023.9	10%	5199.1	70%	14260.5	60%
40	20	3	30.0%	0.9%	1415.6	-30%	4421.7	0%	5571.4	60%	15838.6	50%
40	40	1	10.0%	0.3%	1875.5	-50%	9882.1	30%	7290.1	60%	22519.5	50%
40	40	2	20.0%	0.3%	1088.4	-10%	3786.4	-10%	5639.7	60%	16400.2	60%
40	40	3	30.0%	0.0%	1357.3	-10%	4259.0	0%	5657.6	60%	15374.1	50%
40	80	1	10.0%	0.0%	910.6	0%	2673.0	10%	7457.9	60%	23214.5	50%
40	80	2	20.0%	0.6%	1393.7	-10%	5733.2	30%	5582.0	60%	16533.2	60%
40	80	3	30.0%	0.0%	1347.5	-20%	3954.9	0%	6279.2	50%	15712.4	50%
80	20	1	10.0%	0.1%	855.0	10%	2718.1	20%	16916.3	40%	64214.3	30%
80	20	2	20.0%	0.3%	1259.3	30%	4594.2	60%	9396.9	60%	32960.3	50%
80	20	3	30.0%	0.4%	2123.9	0%	9011.4	10%	8371.7	60%	28033.0	60%
80	40	1	10.0%	0.0%	1651.7	-60%	10064.1	-60%	15135.1	50%	59203.1	30%
80	40	2	20.0%	0.3%	1960.8	20%	8564.9	10%	9610.8	60%	30498.3	50%
80	40	3	30.0%	0.3%	2781.8	-30%	9568.8	10%	9575.0	60%	28208.7	50%
80	80	1	10.0%	0.1%	959.0	50%	2922.2	80%	13394.6	60%	44709.2	50%
80	80	2	20.0%	0.3%	1519.4	10%	5510.7	20%	9007.4	60%	28765.2	60%
80	80	3	30.0%	0.1%	2344.5	10%	9807.7	40%	8877.2	60%	30428.8	60%
128	20	1	10.0%	0.1%	896.8	10%	4684.8	0%	27022.6	30%	106310.2	30%
128	20	2	20.0%	0.6%	1098.0	40%	5088.5	70%	14267.1	50%	54209.6	50%
128	20	3	30.0%	0.5%	1892.9	30%	8988.9	40%	12016.8	60%	44082.4	60%
128	40	1	10.0%	0.1%	1449.0	30%	10143.9	40%	25313.2	40%	105958.6	20%
128	40	2	20.0%	0.2%	1520.5	50%	9324.9	50%	13568.4	60%	51927.7	50%
128	40	3	30.0%	0.1%	4103.5	40%	18628.1	40%	13484.4	70%	48213.4	50%
128	80	1	10.1%	0.3%	956.5	50%	3444.1	70%	23486.7	50%	93787.0	30%
128	80	2	20.0%	0.3%	1670.3	60%	8305.1	70%	12316.0	70%	44018.8	60%
128	80	3	30.0%	0.9%	3914.9	20%	19014.0	30%	15326.8	60%	63039.7	60%

Version 3 benchmarks

Total Tasks	Max Concurrency	Replica Count	Node Count	Failure Rate	Avg Run time	Avg Queue Time	Max Queue Time	Max Run Time
128	80	3	5	0.000000	3855	76667	103048	11022
80	80	3	5	0.012500	3951	45000	60557	8556
40	80	3	5	0.000000	2386	24865	32445	10187
20	80	3	5	0.000000	1939	18014	23248	3095
128	40	3	5	0.007812	5089	90771	117578	19652
80	40	3	5	0.000000	2886	56460	69849	7609
40	40	3	5	0.000000	2146	26668	35319	3508
20	40	3	5	0.000000	2038	19586	24868	3014
128	20	3	5	0.000000	6413	70101	109269	31100
80	20	3	5	0.000000	3078	51401	72506	6939
40	20	3	5	0.000000	2127	31081	36791	3623
20	20	3	5	0.000000	2205	16902	19836	3304
128	80	2	5	0.007812	2848	78955	111321	5731
80	80	2	5	0.000000	2246	56652	87118	5992
20	80	2	5	0.000000	1721	17674	23279	2259
40	80	2	5	0.000000	2135	29990	36930	3248
128	40	2	5	0.007812	2532	72492	108279	6756
80	40	2	5	0.000000	3620	56225	75590	9391
40	40	2	5	0.000000	2048	24523	33774	3154
20	40	2	5	0.000000	1927	15072	18269	2732
128	20	2	5	0.000000	2325	62237	107474	5076
80	20	2	5	0.000000	2553	42657	67140	5982
40	20	2	5	0.000000	2235	28932	36972	3601
20	20	2	5	0.000000	1957	16123	22835	2974
128	80	1	5	0.000000	2105	113833	190044	5106
80	80	1	5	0.000000	2497	82633	135382	6952
40	80	1	5	0.000000	2092	37600	65750	3630
20	80	1	5	0.000000	1842	19383	24808	3004
128	40	1	5	0.000000	2049	109442	207049	5524
80	40	1	5	0.000000	1932	73936	135250	3757
40	40	1	5	0.000000	1937	40138	51027	3343
20	40	1	5	0.000000	1802	17303	22432	2592
128	20	1	5	0.000000	1809	107782	207405	3281
80	20	1	5	0.000000	1755	66260	126222	2863
40	20	1	5	0.000000	1786	35307	60009	2738
20	20	1	5	0.000000	2092	23581	30639	2662
				Average	2499	48785	74731	5943
				Minimum	1721	15072	18269	2259
				Max	6413	113833	207405	31100

In summary, the average improvements of GOAT are as follows:

Average Run Time	Max Run Time	Average Queue Time	Max Queue Time
5%	1%	56%	49%

Average Run Time

Max Run Time

Average Queue Time

Max Queue Time

56%

49%

In some instances, GOAT showed lower performance than the 3.0 self-hosted runner. In these cases, the differences are on the order of milliseconds and can often be attributed to cluster, network, and compute conditions. While some differences may appear extreme, they are often outliers in the 95th (or higher) percentile. The table above is the result of repeating the experiment four times for each row. When these extremes are considered in the context of the rest of the experiments, the net result is still positive for run times.

In queuing, where the most dramatic performance increase is observed, the results are much more consistent and are less influenced by external factors such as remote API calls.

Runner configuration recommendations

These recommendations use the reference architecture of GKE 1.29.4 with a node pool of 5 E2 medium nodes. They build on the benchmarks above for container runner cluster configuration:

Replica count of the container agent.
Maximum concurrent task configuration.

High performance cluster

3 replicas of container agent.
80 concurrent tasks per replica.

This configuration makes a slight trade off in stability, a slightly higher rate of infrastructure failures, to achieve much higher task throughput and to reduce queueing times.

High stability cluster

1 replica of container agent.
20 concurrent tasks per replica.

This configuration trades off throughput for higher stability, with minimal infrastructure failures. Note this is the default configuration for the container agent Helm chart.

When tuning a cluster for performance there are three main variables to consider: container agent replica count, maximum concurrent tasks per replica, and node pool configuration.

Container agent replica count

The more replicas of container agent, the faster tasks will get claimed, as each replica runs its own collection of claiming loops. More replicas help when you have sudden large backlogs of tasks to run. Tasks can be claimed more quickly and have a pod spec submitted to the Kubernetes cluster for scheduling. More replicas and concurrent tasks increase strain on the K8s control plane. This also makes you more prone to task start failures. CircleCI container runners will attempt to reschedule a task up to three times before declaring an infrastructure failure.

Maximum concurrent tasks per replica

This number in particular is very sensitive to node types and counts. The more tasks that launch in a short window, the higher the strain on the Kubernetes cluster’s control plane. Individual kubelets, which manage pods and containers on a specific node, also experience increased strain. As node power and count increase, the impact of concurrent tasks on a cluster decreases. The lower the number of maximum concurrent tasks, the greater the reliability of tasks successfully starting and not experiencing an infrastructure failure.

The likelihood of an infrastructure failure for a task decreases as node count and resources are increased, particularly CPU.

Node types and count

The recommendations already presented are based on the reference cluster configuration. As a node pool grows, or is set to an instance type with greater resources, task execution becomes more reliable. When sizing a cluster, add headspace beyond that expected for an individual task. The kubelet and container driver share the same resources as the pods on the node. The more resource starved they become, the more prone to long queue times and infrastructure failures tasks become. The more distributed pods are able to be scheduled, the less pressure and backlog are applied to the individual kubelets and container engines, resulting in shorter queueing times.

Troubleshooting

Refer to the Troubleshoot Container Runner section of the Troubleshoot Self-hosted Runner guide if you encounter issues installing or using container runner.

[path]

[title]