# Introduction to Nomad cluster operation

CircleCI uses [Nomad](https://www.nomadproject.io/) as the primary job scheduler. This section provides a basic introduction to Nomad for understanding how to operate the Nomad Cluster in your CircleCI installation.

## Basic terminology and architecture

Figure 1. Nomad Cluster Management

*   **Nomad server:** Nomad servers are the brains of the cluster. They receive and allocate jobs to Nomad clients. In CircleCI server, a Nomad server runs as a service in your Kubernetes cluster.
    
*   **Nomad client:** Nomad clients execute the jobs they are allocated by Nomad servers. Usually a Nomad client runs on a dedicated machine (often a VM) to take full advantage of machine power. You can have multiple Nomad clients to form a cluster and the Nomad server allocates jobs to the cluster with its scheduling algorithm.
    
*   **Nomad jobs:** A Nomad job is a specification, provided by a user, that declares a workload for Nomad. A Nomad job corresponds to an execution of a CircleCI job. If the job uses [parallelism](https://circleci.com/docs/guides/optimize/parallelism-faster-jobs/), for example `parallelism: 10`, then Nomad runs 10 jobs.
    
*   **Build agent:** Build agent is a Go program written by CircleCI that executes steps in a job and reports the results. Build agent is executed as the main process inside a Nomad job.
    

## Basic operations

The following section is a basic guide to operating a Nomad cluster in your installation.

The `nomad` CLI is installed in the Nomad pod. It is preconfigured to talk to the Nomad cluster, so it is possible to use `kubectl` along with the `nomad` command to run the commands in this section.

### Checking the jobs status

The get a list of statuses for all jobs in your cluster, run the following command:

`````````
kubectl exec -it <nomad-server-pod-ID> -- nomad status
`````````

The `Status` is the most important field in the output, with the following status type definitions:

*   `running`: Nomad has started executing the job. This typically means your job in CircleCI is started.
    
*   `pending`: There are not enough resources available to execute the job inside the cluster.
    
*   `dead`: Nomad has finished executing the job. The status becomes `dead` regardless of whether the corresponding CircleCI job/build succeeds or fails.
    

### Checking the cluster status

To get a list of your Nomad clients, run the following command:

`````````
kubectl exec -it <nomad-server-pod-ID> -- nomad node-status
`````````

`nomad node-status` reports both Nomad clients that are currently serving (status `active`) and Nomad clients that were taken out of the cluster (status `down`). Therefore, you need to count the number of `active` Nomad clients to know the current capacity of your cluster.

To get more information about a specific client, run the following command from that client:

`````````
kubectl exec -it <nomad-server-pod-ID> -- nomad node-status -self
`````````

This gives information such as how many jobs are running on the client and the resource utilization of the client.

### Checking logs

A Nomad job corresponds to an execution of a CircleCI job. Therefore, Nomad job logs can sometimes help to understand the status of a CircleCI job if there is a problem. To get logs for a specific job, run the following command:

`````````
kubectl exec -it <nomad-server-pod-ID> -- nomad logs -job -stderr <nomad-job-id>
`````````

Be sure to specify the `-stderr` flag, as this is where most Build Agent logs appear.

While the `nomad logs -job` command is useful, it is not always accurate because the `-job` flag uses a random allocation of the specified job. The term `allocation` is a smaller unit in Nomad Job, which is beyond the scope of this document. To learn more, see [the official document](https://www.nomadproject.io/docs/internals/scheduling.html).

Complete the following steps to get logs from the allocation of the specified job:

1.  Get the job ID with `nomad status` command.
    
2.  Get the allocation ID of the job with `nomad status <job-id>` command.
    
3.  Get the logs from the allocation with `nomad logs -stderr <allocation-id>`
    

### Shutting down a Nomad client

When you want to shut down a Nomad client, you must first set the client to `drain` mode. In `drain` mode, the client will finish any jobs that have already been allocated but will not be allocated any new jobs.

1.  To drain a client, log in to the client and set the client to drain mode with `node-drain` command as follows:
    
    `````````
    nomad node-drain -self -enable
    `````````
    
2.  Then, make sure the client is in drain mode using the `node-status` command:
    
    `````````
    nomad node-status -self
    `````````
    

Alternatively, you can drain a remote node with the following command, substituting the node ID:

`````````
nomad node-drain -enable -yes <node-id>
`````````

### Scaling down the client cluster

To set up a mechanism for clients to shutdown, first enter `drain` mode, then wait for all jobs to be finished before terminating the client. You can also configure an [ASG Lifecycle Hook](https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html) that triggers a script for scaling down instances.

The script should use the commands in the section above to do the following:

1.  Put the instance in drain mode.
    
2.  Monitor running jobs on the instance and wait for them to finish.
    
3.  Terminate the instance.
    

## Next steps

*   Read the [Managing user accounts](https://circleci.com/docs/server-admin/server-4.4/operator/managing-user-accounts/) guide.