CircleCI Server コンテナのアーキテクチャ

本ドキュメントでは、CircleCI ServerのインストールされるServicesマシン上で実行されるコンテナ化されたサービスの概要を説明します。 このドキュメントは、サービス運用の概要を理解し、サービス停止時のトラブルシューティングに役立てていただくことを目的としています。 This is provided both to give an overview of service operation, and to help with troubleshooting in the event of service outages. Supplementary notes and a key are provided below the following table.

Notes

  • ここに記載したデータベース移行サービスは、起動時のみに実行されるため、障害時の重大度は低くなっています。

    移行サービスが起動時にダウンしている場合、接続先のサービスが動作しません。
  • Platinum Supportを利用している場合、一部のサービスは外部化でき (ここでは * を付けて示しています)、要件に合わせて管理できます。 外部化することで、データの安全性が高まり、システムに冗長性を持たせることができます。

key

アイコン 説明

障害による本番環境への影響は小さく、データ損失や機能停止は発生しません。

障害によって一部のジョブで問題が発生するおそれはありますが、データ損失は発生しません。

障害によってデータ損失、ジョブ/ワークフローの破損、大規模な機能停止が発生するおそれがあります。

コンテナ、ロール、故障モード、スタートアップ時の依存関係

コンテナ/イメージ 役割 障害発生時の影響 障害の重大度 スタートアップ時の依存関係

api-service

GraphQL API を提供します。 この API は、Web フロントエンドのレンダリング データを多く提供します。

多くの UI 要素 (例: コンテキスト) が完全に機能しなくなります。

postgres, frontend, contexts-service-migrator, contexts-service, vault-cci

audit-log-service

監査ログ イベントを blob ストレージに長期保存します。

一部のイベントが記録されなくなります。

postgres, frontend

contexts-service

暗号化されたコンテキストを保存、提供します。

コンテキストを使用するすべてのビルドが失敗するようになります。

postgres, frontend, contexts-service-migrator, vault-cci

contexts-service-migrator

`contexts-service`のためにpostgresqlのマイグレーションを実行します。

起動時にのみ実行されます。

postgres, frontend

cron-service

スケジュールされたワークフローをトリガーします。

スケジュールされたワークフローが実行されなくなります。

postgres, frontend, cron-service-migrator

cron-service-migrator

cron-serviceのためにpostgresqlのマイグレーションを実行します。

起動時にのみ実行されます。

postgres, frontend

domain-service

CircleCIのドメイン モデルに関する情報を保存、提供します。

ワークフローを開始できなくなります。 一部の REST API 呼び出しが失敗し、CircleCI UI で 500 エラーが発生する可能性があります。 LDAP 認証を使用している場合、すべてのログインが失敗するようになります。

postgres, frontend, domain-service-migrator

domain-service-migrator

`domain-service`のためにpostgresqlのマイグレーションを実行します。

起動時にのみ実行されます。

postgres, frontend

exim

Mail Transfer Agent (MTA) used to send all outbound SMTP.

No email notifications will be sent.

None

federation-service

Stores user identities (LDAP).

If LDAP authentication is in use, all logins will fail and some REST API calls might fail.

only if LDAP in use

postgres, frontend, federations-service-migrator

federation-service-migrator

Runs postgresql migrations for the federations-service.

Only runs at startup.

postgres, frontend

fileserved

File storage service used as a replacement for S3 when CircleCI Server is run outside of AWS. Not used if Server is configured to use S3. Stores step output logs, artifacts, test results, caches and workspaces.

If not using S3, builds will produce no outputand some REST API calls might fail.

if not using S3

None

frontend

CircleCI web app and www-api proxy.

The UI and REST API will be unavailable and no jobs will be triggered by GitHub/Enterprise. Running builds will be OK but no updates will be seen.

postgres

mongo *

Mongo data store.

Potential total data loss. All running builds will fail and the UI will not work.

mongodb-upgrader

nomad-metrics

Queries the nomad server for stats and sends them to statsd.

Nomad metrics will be lost, but everything else should run as normal.

None

output-processor / output-processing

Receives job output & status updates and writes them to MongoDB. Also provides an API to running jobs to access caches, workspaces, store caches, workspaces, artifacts, & test results.

All running builds will either fail or be left in an unfixable, inconsistent state. There will also be data loss in terms of step output, test results and artifacts.

None

permissions-service

Provides the CircleCI permissions interface.

Workflows will fail to start and some REST API calls may fail, causing 500 errors in the UI.

postgres, frontend, permissions-service-migrator

permissions-service-migrator

Runs postgresql migrations for the permissions-service

Only runs at startup.

postgres, frontend

picard-dispatcher

Splits a job into tasks and sends them to schedulerer to be run.

No jobs will be sent to Nomad, the run queue will increase in size but there should be no meaningful loss of data.

None

postgres / postgres-script-enhance *

Basic postgresql with enhancements for creating required databases when containers are launched.

Potential total data loss. All running builds will fail and the UI will not work.

None

rabbitmq / rabbitmq-delayed *

Runs the RabbitMQ server. Most of our services use RabbitMQ for queueing.

Potential total data loss. All running builds will fail and the UI will not work.

None

outputRunningRedis / redis *

The Redis key/value store.

Lose output from currently-running job steps. API calls out to GitHub may also fail.

None

schedulerer

Sends tasks to server-nomad to run. \

No jobs will be sent to Nomad, the run queue will increase in size but there should be no meaningful loss of data.

None

mongodb-upgrader / server-mongo-upgrader

Used to run any mongo conversion/upgrade scripts during mongo version upgrade.

Not required to run all the time. \

None

nomad_server / server-nomad *

Nomad primary service.

No 2.0 build jobs will run.

None

ready-agent / server-ready-agent

Called by Replicated to check whether other containers are ready.

Only required on startup. If unavailable on startup the whole system will fail.

None

server-usage-stats

Sends the user count to the internal CircleCI “phone home” endpoint.

CircleCI will not receive usage stats for your install but no affect on operation.

None

shutdown-hook-poller

Checks the frontend container for 1.0 Builder shutdown requests. If a request is found, the 1.0 Builder is shut down.

1.0 Builder lifecycles will not be properly managed, but jobs will continue to run.

None

slanger

Provides real-time events to the CircleCI app.

Live UI updates will stop but hard refreshes will still work.

None

telegraf

This is the statsd forwarding agent that our local services write to and can be configured to forward to an external metrics service.

Metics will stop working but jobs will continue to run.

None

tutum/logrotate

Used to manage log rotations for all containers on the services machine.

If this stays down for a long period the Services machine disk will eventually run out of space and other services will fail.

None

test-results

Parses test result files and stores data.

There will be no test failure or timing data for jobs, but this will be back-filled once the service is restarted.

None

contexts-vault / vault-cci *

Instance of Hashicorp’s Vault – an encryption service that provides key-management, secure storage, and other encryption related services. Used to handle the encryption and key store for the contexts-service.

contexts-service will stop working, and all jobs that use contexts-service will fail.

None

vm-gc

Periodically check for stale machine and remote Docker instances and request that vm-service remove them.

Old vm-service instances might not be destroyed until this service is restarted.

vm-service-db-migrator

vm-scaler

Periodically requests that vm-service provision more instances for running machine and remote Docker jobs.

VM instances for machine and Remote Docker might not be provisioned causing you to run out of capacity to run jobs with these executors.

vm-service-db-migrator

vm-service

Inventory of available vm-service instances, and provisioning of new instances.

Jobs that use machine or remote Docker will fail.

vm-service-db-migrator

vm-service-db-migrator

Used to run database migrations for vm-service.

Only runs at startup.

None

workflows-conductor

Coordinates and provides information about workflows.

No new workflows will start, currently running workflows might end up in an inconsistent state, and some REST and GraphQL API requests will fail.

postgres, frontend, workflows-conductor-migrator

workflows-conductor-migrator

Runs postgreSQL migrations for the workflows-conductor.

Only runs on startup.

postgres, frontend



Help make this document better

This guide, as well as the rest of our docs, are open-source and available on GitHub. We welcome your contributions.