Search Results for ""

CircleCI Server Container Architecture

This document outlines the containerized services that run on the Services machine within a CircleCI Server installation. This is provided both to give an overview of service operation, and to help with troubleshooting in the event of service outages. Supplementary notes and a key are provided below the following table.

Notes

  • Database migrator services are listed here with a low failure severity as they only run at startup, however:

    If migrator services are down at startup connected services will fail.
  • With a platinum support contract some services can be externalized (marked with * here) and managed to suit your requirements. Externalization provides higher data security and allows for redundancy to be built into your system.

key

Icon Description

Failure has a minor affect on production - no loss of data or functioning.

Failure might cause issues with some jobs, but no loss of data.

Failure can cause loss of data, corruption of jobs/workflows, major loss of functionality.

Containers, Roles, Failure Modes and Startup Dependencies

Container / Image Role What happens if it fails? Failure severity Startup dependencies

api-service

Provides a GraphQL API that provides much of the data to render the web frontend.

Many parts of the UI (e.g. Contexts) will fail completely.

postgres, frontend, contexts-service-migrator, contexts-service, vault-cci

audit-log-service

Persists audit log events to blob storage for long term storage.

Some events may not be recorded.

postgres, frontend

contexts-service

Stores and provides encrypted contexts.

All builds using Contexts will fail.

postgres, frontend, contexts-service-migrator, vault-cci

contexts-service-migrator

Runs postgresql migrations for the contexts-service.

Only runs at startup.

postgres, frontend

cron-service

Triggers scheduled workflows.

Scheduled workflows will not run.

postgres, frontend, cron-service-migrator

cron-service-migrator

Runs postgresql migrations for the cron-service.

Only runs at startup.

postgres, frontend

domain-service

Stores and provides information about our domain model.

Workflows will fail to start and some REST API calls may fail causing 500 errors in the CircleCI UI.

postgres, frontend, domain-service-migrator

domain-service-migrator

Runs postgresql migrations for the domain-service.

Only runs at startup.

postgres, frontend

exim

Mail Transfer Agent (MTA) used to send all outbound SMTP.

No email notifications will be sent.

None

federation-service

Stores user identities (LDAP).

If LDAP authentication is in use, all logins will fail and some REST API calls might fail.

only if LDAP in use

postgres, frontend, federations-service-migrator

federation-service-migrator

Runs postgresql migrations for the federations-service.

Only runs at statup.

postgres, frontend

fileserved

File storage service used as a replacement for S3 when CircleCI Server is run outside of AWS. Not used if Server is configured to use S3. Stores step output logs, artifacts, test results, caches and workspaces.

If not using S3, builds will produce no outputand some REST API calls might fail.

if not using S3

None

frontend

CircleCI web app and www-api proxy.

The UI and REST API will be unavailable and no jobs will be triggered by Github/Enterprise. Running builds will be OK but no updates will be seen.

postgres

mongo *

Mongo data store.

Potential total data loss. All running builds will fail and the UI will not work.

mongodb-upgrader

nomad-metrics

Queries the nomad server for stats and sends them to statsd.

Nomad metrics will be lost, but everything else should run as normal.

None

output-processor / output-processing

Receives job output & status updates and writes them to MongoDB. Also provides an API to running jobs to access caches, workspaces, store caches, workspaces, artifacts, & test results.

All running builds will either fail or be left in an unfixable, inconsistent state. There will also be data loss in terms of step output, test results and artifacts.

None

permissions-service

Provides the CircleCI permissions interface.

Workflows will fail to start and some REST API calls may fail, causing 500 errors in the UI.

postgres, frontend, permissions-service-migrator

permissions-service-migrator

Runs postgresql migrations for the permissions-service

Only runs at startup.

postgres, frontend

picard-dispatcher

Splits a job into tasks and sends them to schedulerer to be run.

No jobs will be sent to Nomad, the run queue will increase in size but there should be no meaningful loss of data.

None

postgres / postgres-script-enhance *

Basic postgresql with enhancements for creating required databases when containers are launched.

Potential total data loss. All running builds will fail and the UI will not work.

None

rabbitmq / rabbitmq-delayed *

Runs the RabbitMQ server. Most of our services use RabbitMQ for queueing.

Potential total data loss. All running builds will fail and the UI will not work.

None

outputRunningRedis / redis *

The Redis key/value store.

Lose output from currently-running job steps. API calls out to github may also fail.

None

schedulerer

Sends tasks to server-nomad to run. \

No jobs will be sent to Nomad, the run queue will increase in size but there should be no meaningful loss of data.

None

mongodb-upgrader / server-mongo-upgrader

Used to run any mongo conversion/upgrade scripts during mongo version upgrade.

Not required to run all the time. \

None

nomad_server / server-nomad *

Nomad primary service.

No 2.0 build jobs will run.

None

ready-agent / server-ready-agent

Called by Replicated to check whether other containers are ready.

Only required on startup. If unavailable on startup the whole system will fail.

None

server-usage-stats

Sends the user count to the internal CircleCI “phone home” endpoint.

CircleCI will not receive usage stats for your install but no affect on operation.

None

shutdown-hook-poller

Checks the frontend container for 1.0 Builder shutdown requests. If a request is found, the 1.0 Builder is shut down.

1.0 Builder lifecycles will not be properly managed, but jobs will continue to run.

None

slanger

Provides real-time events to the CircleCI app.

Live UI updates will stop but hard refreshes will still work.

None

telegraf

This is the statsd forwarding agent that our local services write to and can be configured to forward to an external metrics service.

Metics will stop working but jobs will continue to run.

None

tutum/logrotate

Used to manage log rotations for all containers on the services machine.

If this stays down for a long period the Services machine disk will eventually run out of space and other services will fail.

None

test-results

Parses test result files and stores data.

There will be no test failure or timing data for jobs, but this will be back-filled once the service is restarted.

None

contexts-vault / vault-cci *

Instance of Hashicorp’s Vault – an encryption service that provides key-management, secure storage, and other encryption related services. Used to handle the encryption and key store for the contexts-service.

contexts-service will stop working, and all jobs that use contexts-service will fail.

None

vm-gc

Periodically check for stale machine and remote Docker instances and request that vm-service remove them.

Old vm-service instances might not be destroyed until this service is restarted.

vm-service-db-migrator

vm-scaler

Periodically requests that vm-service provision more instances for running machine and remote Docker jobs.

VM instances for machine and Remote Docker might not be provisioned causing you to run out of capacity to run jobs with these executors.

vm-service-db-migrator

vm-service

Inventory of available vm-service instances, and provisioning of new instances.

Jobs that use machine or remote Docker will fail.

vm-service-db-migrator

vm-service-db-migrator

Used to run database migrations for vm-service.

Only runs at startup.

None

workflows-conductor

Coordinates and provides information about workflows.

No new workflows will start, currently running workflows might end up in an inconsistent state, and some REST and GraphQL API requests will fail.

postgres, frontend, workflows-conductor-migrator

workflows-conductor-migrator

Runs postgreSQL migrations for the workflows-conductor.

Only runs on startup.

postgres, frontend