Hardening your cluster
On This Page
- Network topology
- Network traffic
- Reverse proxy status
- Kubernetes load balancers
- Ingress
- Egress
- Common rules for compute instances
- Ingress
- Egress
- Kubernetes nodes
- Intra-node traffic
- Ingress
- Egress
- Nomad clients
- Ingress
- Egress
- External VMs
- Ingress
- Egress
- Notes on AWS networking with VM service
- Private IPs only
- Using public IPs
- Next steps
This section provides supplemental information on hardening your Kubernetes cluster.
Network topology
A server installation basically runs three different type of compute instances: The Kubernetes nodes, Nomad clients, and external VMs.
Best practice is to make as many of the resources as private as possible. If your users will access your CircleCI server installation via VPN, there is no need to assign any public IP addresses at all, as long as you have a working NAT gateway setup. Otherwise, you will need at least one public subnet for the circleci-proxy
load balancer.
However, in this case, it is also recommended to place Nomad clients and VMs in a public subnet to enable your users to SSH into jobs and scope access via networking rules.
An nginx reverse proxy is placed in front of Kong and exposed as a Kubernetes service named circleci-proxy . nginx is responsible routing the traffic to the following services: kong , vm-service , output-processor and nomad . |
When using Amazon Certificate Manager (ACM), the name of nginx’s service will be circleci-proxy-acm instead of circleci-proxy . If you have switched from some other method of handling your TLS certificates to using ACM, this change will recreate the load balancer and you will have to reroute your associated DNS records for your <domain> and app.<domain>. |
Network traffic
This section explains the minimum requirements for a server installation to work. Depending on your workloads, you might need to add additional rules to egress for Nomad clients and VMs. As nomenclature between cloud providers differs, you will probably need to implement these rules using firewall rules and/or security groups.
Where you see "external," this usually means all external IPv4 addresses. Depending on your particular setup, you might be able to be more specific (for example, if you are using a proxy for all external traffic).
The rules explained here are assumed to be stateful and for TCP connections only, unless stated otherwise. If you are working with stateless rules, you need to create matching ingress or egress rules for the ones listed here.
Reverse proxy status
You may wish to check the status of the services routing traffic in your CircleCI server installation and alert if there are any issues. Since we use both nginx and Kong in CircleCI server, we expose the status pages of both via port 80.
Service | Endpoint |
---|---|
nginx |
|
Kong |
|
Kubernetes load balancers
Depending on your setup, your load balancers might be transparent (that is, they are not treated as a distinct layer in your networking topology). In this case, you can apply the rules from this section directly to the underlying destination or source of the network traffic. Refer to the documentation of your cloud provider to make sure you understand how to correctly apply networking security rules, given the type of load balancing you are using with your installation.
Ingress
If the traffic rules for your load balancers have not been created automatically, here are their respective ports:
Name | Port | Source | Purpose |
---|---|---|---|
circleci-proxy/-acm | 80 | External | User Interface & Frontend API |
circleci-proxy/-acm | 443 | External | User Interface & Frontend API |
circleci-proxy/-acm | 3000 | Nomad clients | Communication with Nomad clients |
circleci-proxy/-acm | 4647 | Nomad clients | Communication with Nomad clients |
circleci-proxy/-acm | 8585 | Nomad clients | Communication with Nomad clients |
Egress
The only type of egress needed is TCP traffic to the Kubernetes nodes on the Kubernetes load balancer ports (30000-32767). This is not needed if your load balancers are transparent.
Common rules for compute instances
These rules apply to all compute instances, but not to the load balancers.
Ingress
If you want to access your instances using SSH, you will need to open port 22 for TCP connections for the instances in question. It is recommended to scope the rule as closely as possible to allowed source IPs and/or only add such a rule when needed.
Egress
You most likely want all of your instances to access internet resources. This requires you to allow egress for UDP and TCP on port 53 to the DNS server within your VPC, as well as TCP ports 80 and 443 for HTTP and HTTPS traffic, respectively. Instances building jobs (that is, the Nomad clients and external VMs) also will likely need to pull code from your VCS using SSH (TCP port 22). SSH is also used to communicate with external VMs, so it should be allowed for all instances with the destination of the VM subnet and your VCS, at the very least.
Kubernetes nodes
Intra-node traffic
By default, the traffic within your Kubernetes cluster is regulated by networking policies. For most purposes, this should be sufficient to regulate the traffic between pods and there is no additional requirement to reduce traffic between Kubernetes nodes any further (it is fine to allow all traffic between Kubernetes nodes).
To make use of networking policies within your cluster, you may need to take additional steps, depending on your cloud provider and setup. Here are some resources to get you started:
Ingress
If you are using a managed service, you can check the rules created for the traffic coming from the load balancers and the allowed port range. The standard port range for Kubernetes load balancers (30000-32767) should be all that is needed here for ingress. If you are using transparent load balancers, you need to apply the ingress rules listed for load balancers above.
Egress
Port | Destination | Purpose |
---|---|---|
2376 | VMs | Communication with VMs |
4647 | Nomad clients | Communication with the Nomad clients |
all traffic | other nodes | Allow intra-cluster traffic |
Nomad clients
Nomad clients do not need to communicate with each other. You can block traffic between Nomad client instances completely.
Ingress
Port | Source | Purpose |
---|---|---|
4647 | K8s nodes | Communication with Nomad server |
64535-65535 | External | Rerun jobs with SSH functionality |
Egress
Port | Destination | Purpose |
---|---|---|
22 | VMs | SSH communication with VMs |
2376 | VMs | Docker communication with VMs |
3000 | VM Service load balancers | Internal communication |
4647 | Nomad Load Balancer | Internal communication |
8585 | Output Processor Load Balancer | Internal communication |
External VMs
Similar to Nomad clients, there is no need for external VMs to communicate with each other.
Ingress
Port | Source | Purpose |
---|---|---|
22 | Kubernetes nodes | Internal communication |
22 | Nomad clients | Internal communication |
2376 | Kubernetes nodes | Internal communication |
2376 | Nomad clients | Internal communication |
54782 | External | Rerun jobs with SSH functionality |
Egress
You will only need the egress rules for internet access and SSH for your VCS.
Notes on AWS networking with VM service
When using the EC2 provider for VM service, there is an assignPublicIP
option available in the values.yaml
file.
vm_service:
...
providers:
ec2:
...
assignPublicIP: false
By default this option is set to false, meaning any instance created by VM service will only be assigned a private IP address.
Communication to start a virtual machine (VM), and run a job, occurs in two stages:
-
The
vm-service
pod establishes a connection to the newly created VM via ports22
and2376
. -
The Nomad client running the job establishes a connection to the newly created VM via ports
22
and2376
.
Private IPs only
When the assignPublicIP
option is set to false, restricting traffic with security group rules between services can be done using the Source Security Group ID parameter.
Within the ingress rules of the VM security group, the following rules can be created to harden your installation:
Port | Origin | Purpose |
---|---|---|
22 | Nomad clients' security group | Allows Nomad clients to SSH into VM |
2376 | Nomad clients' security group | Allows Nomad clients to connect to Docker on VM |
22 | EKS cluster security group | Allows vm-service pods to SSH into VM |
2376 | EKS cluster security group | Allows vm-service pods to connect to Docker on VM |
54782 | CIDR range of your choice | Allows users to SSH into failed vm-based jobs and to retry and debug |
Using public IPs
When the assignPublicIP
option is set to true, all EC2 instances created by VM service are assigned public IPv4 addresses, and, as such, all services communicating with them do so via their public addresses.
SSH traffic from the vm-service
pod will flow through the NAT gateway of the subnet of the cluster. Since traffic moves outside the VPC it is not possible to restrict traffic by security group origin. It is instead necessary to add the IPs of the NAT gateway(s) used by the cluster to your safelist.
If both Nomad clients and VM service VMs have been assigned public IPs, SSH and Docker traffic will route through the subnets' internet gateways. Since traffic moves through the public internet, security groups are no longer an option for restricting traffic. In order to restrict access on these ports, the public IPv4 addresses of the Nomad clients must be added to the safelist in the VM service security group ingress rules. Keep in mind that these IPs and machines are ephemeral, and will require a mechanism to update the VM service security group on change.
When hardening an installation where the VM service uses public IPs, the following rules can be created.
Port | Origin | Purpose |
---|---|---|
22 | Individual ipv4 addresses of all Nomad clients (or 0.0.0.0/0 to allow for any possible assigned IP). | Allows Nomad clients to SSH into VM. |
2376 | Individual ipv4 addresses of all Nomad clients (or 0.0.0.0/0 to allow for any possible assigned IP). | Allows Nomad clients to connect to Docker on VM. |
22 | Cluster NAT gateway ipv4 ranges | Allows traffic to the VM from the |
2376 | Cluster NAT gateway ipv4 ranges | Allows traffic to the VM from the |
54782 | CIDR range of your choice | Allows users to SSH into failed vm-based jobs to retry and debug. |