CircleCI Server Container Architecture
This document outlines the containerized services that run on the Services machine within a CircleCI Server installation. This is provided both to give an overview of service operation, and to help with troubleshooting in the event of service outages. Supplementary notes and a key are provided below the following table.
Notes
-
Database migrator services are listed here with a low failure severity as they only run at startup, however:
If migrator services are down at startup connected services will fail. -
With a platinum support contract some services can be externalized (marked with * here) and managed to suit your requirements. Externalization provides higher data security and allows for redundancy to be built into your system.
key
Icon | Description |
---|---|
Failure has a minor affect on production - no loss of data or functioning. |
|
Failure might cause issues with some jobs, but no loss of data. |
|
Failure can cause loss of data, corruption of jobs/workflows, major loss of functionality. |
Containers, Roles, Failure Modes and Startup Dependencies
Container / Image | Role | What happens if it fails? | Failure severity | Startup dependencies |
---|---|---|---|---|
|
Provides a GraphQL API that provides much of the data to render the web frontend. |
Many parts of the UI (e.g. Contexts) will fail completely. |
|
|
|
Persists audit log events to blob storage for long term storage. |
Some events may not be recorded. |
|
|
|
Stores and provides encrypted contexts. |
All builds using Contexts will fail. |
|
|
|
Runs postgresql migrations for the |
Only runs at startup. |
|
|
|
Triggers scheduled workflows. |
Scheduled workflows will not run. |
|
|
|
Runs postgresql migrations for the cron-service. |
Only runs at startup. |
|
|
|
Stores and provides information about our domain model. |
Workflows will fail to start and some REST API calls may fail causing |
|
|
|
Runs postgresql migrations for the |
Only runs at startup. |
|
|
|
Mail Transfer Agent (MTA) used to send all outbound SMTP. |
No email notifications will be sent. |
None |
|
|
Stores user identities (LDAP). |
If LDAP authentication is in use, all logins will fail and some REST API calls might fail. |
only if LDAP in use |
|
|
Runs postgresql migrations for the |
Only runs at startup. |
|
|
|
File storage service used as a replacement for S3 when CircleCI Server is run outside of AWS. Not used if Server is configured to use S3. Stores step output logs, artifacts, test results, caches and workspaces. |
If not using S3, builds will produce no outputand some REST API calls might fail. |
if not using S3 |
None |
|
CircleCI web app and www-api proxy. |
The UI and REST API will be unavailable and no jobs will be triggered by GitHub/Enterprise. Running builds will be OK but no updates will be seen. |
|
|
|
Mongo data store. |
Potential total data loss. All running builds will fail and the UI will not work. |
|
|
|
Queries the nomad server for stats and sends them to statsd. |
Nomad metrics will be lost, but everything else should run as normal. |
None |
|
|
Receives job output & status updates and writes them to MongoDB. Also provides an API to running jobs to access caches, workspaces, store caches, workspaces, artifacts, & test results. |
All running builds will either fail or be left in an unfixable, inconsistent state. There will also be data loss in terms of step output, test results and artifacts. |
None |
|
|
Provides the CircleCI permissions interface. |
Workflows will fail to start and some REST API calls may fail, causing 500 errors in the UI. |
|
|
|
Runs postgresql migrations for the |
Only runs at startup. |
|
|
|
Splits a job into tasks and sends them to |
No jobs will be sent to Nomad, the run queue will increase in size but there should be no meaningful loss of data. |
None |
|
|
Basic |
Potential total data loss. All running builds will fail and the UI will not work. |
None |
|
|
Runs the RabbitMQ server. Most of our services use RabbitMQ for queueing. |
Potential total data loss. All running builds will fail and the UI will not work. |
None |
|
|
The Redis key/value store. |
Lose output from currently-running job steps. API calls out to GitHub may also fail. |
None |
|
|
Sends tasks to |
No jobs will be sent to Nomad, the run queue will increase in size but there should be no meaningful loss of data. |
None |
|
|
Used to run any mongo conversion/upgrade scripts during mongo version upgrade. |
Not required to run all the time. \ |
None |
|
|
Nomad primary service. |
No 2.0 build jobs will run. |
None |
|
|
Called by Replicated to check whether other containers are ready. |
Only required on startup. If unavailable on startup the whole system will fail. |
None |
|
|
Sends the user count to the internal CircleCI “phone home” endpoint. |
CircleCI will not receive usage stats for your install but no affect on operation. |
None |
|
|
Checks the |
1.0 Builder lifecycles will not be properly managed, but jobs will continue to run. |
None |
|
|
Provides real-time events to the CircleCI app. |
Live UI updates will stop but hard refreshes will still work. |
None |
|
|
This is the statsd forwarding agent that our local services write to and can be configured to forward to an external metrics service. |
Metics will stop working but jobs will continue to run. |
None |
|
|
Used to manage log rotations for all containers on the services machine. |
If this stays down for a long period the Services machine disk will eventually run out of space and other services will fail. |
None |
|
|
Parses test result files and stores data. |
There will be no test failure or timing data for jobs, but this will be back-filled once the service is restarted. |
None |
|
|
Instance of Hashicorp’s Vault – an encryption service that provides key-management, secure storage, and other encryption related services. Used to handle the encryption and key store for the |
|
None |
|
|
Periodically check for stale |
Old vm-service instances might not be destroyed until this service is restarted. |
|
|
|
Periodically requests that |
VM instances for |
|
|
|
Inventory of available |
Jobs that use |
|
|
|
Used to run database migrations for |
Only runs at startup. |
None |
|
|
Coordinates and provides information about workflows. |
No new workflows will start, currently running workflows might end up in an inconsistent state, and some REST and GraphQL API requests will fail. |
|
|
|
Runs postgreSQL migrations for the |
Only runs on startup. |
|