Docs
circleci.com
Start Building for Free

Using metrics

2 weeks ago1 min read
Server v4.x
Server Admin
On This Page

Metrics such as CPU or memory usage and internal metrics can help you manage and debug your server installation in many ways, including:

  • Quickly detecting incidents and abnormal behavior.

  • Dynamically scaling compute resources.

  • Retroactively understanding infrastructure-wide issues.

Metrics collection

Scope

Your CircleCI server installation collects a number of metrics and logs by default, which can be useful in monitoring the health of your system and debugging issues with your installation.

Telegraf

Most services running on server report StatsD metrics to the Telegraf pod running in server. The configuration is fully customizable, so you can forward your metrics from Telegraf to any output supported by Telegraf through output plugins. By default, it will provide a metrics endpoint for Prometheus to scrape.

Use Telegraf to forward metrics to Datadog

The following example shows how to configure Telegraf to output metrics to Datadog:

In your custom values.yaml for your CircleCI server install you may already have a section for Telegraf as below. If not you can add it now.

...
telegraf:
  args:
    - --config
    - /etc/telegraf/telegraf.d/telegraf_custom.conf
  config:
    custom_config_file: |
      [agent]
        collection_jitter = "0s"
        debug = false
        flush_interval = "10s"
        flush_jitter = "0s"
        hostname = "$HOSTNAME"
        interval = "30s"
        logfile = ""
        metric_batch_size = 1000
        metric_buffer_limit = 10000
        omit_hostname = true
        precision = ""
        quiet = false
        round_interval = true

      [[processors.enum]]
        [[processors.enum.mapping]]
          dest = "status_code"
          field = "status"
          [processors.enum.mapping.value_mappings]
              critical = 3
              healthy = 1
              problem = 2

      [[inputs.statsd]]
        datadog_extensions = true
        metric_separator = "."
        percentile_limit = 1000
        percentiles = [
          50,
          95,
          99
        ]
        service_address = ":8125"
      [[inputs.internal]]
        collect_memstats = false

      [[outputs.file]]
        files = ["stdout"]

      [[outputs.prometheus_client]]
        listen = ":9273"
        path = "/metrics"
...

In the telgraf.config.custom_config_file block of your helm values.yaml, add the as seen below to the bottom of the config block, replacing "my-secret-key" with your datadog API key.

telegraf:
  config:
    custom_config_file: |
      ...
      [[outputs.datadog]]
        ## Replace "my-secret-key" with Datadog API key
        apikey = "my-secret-key"
      ...

Finally, you will need to apply the changes via helm update.

helm upgrade circleci-server oci://cciserver.azurecr.io/circleci-server -n <namespace> --version <version> -f <path-to-values.yaml> --username $USERNAME --password $PASSWORD

Note: If your Telegraf pod does not restart after your helm upgrade, then the changes will not take effect. You may gracefully restart your Telegraf instance with command below. This will restart Telegraf with zero downtime.

kubectl rollout restart deploy telegraf -n <namespace>

Next steps


Help make this document better

This guide, as well as the rest of our docs, are open source and available on GitHub. We welcome your contributions.

Need support?

Our support engineers are available to help with service issues, billing, or account related questions, and can help troubleshoot build configurations. Contact our support engineers by opening a ticket.

You can also visit our support site to find support articles, community forums, and training resources.