📣
TiDB Cloud Essential is now in public preview. Try it out →

Integrate TiDB Cloud with Prometheus and Grafana (Preview)



TiDB Cloud provides a Prometheus API endpoint. If you have a Prometheus service, you can monitor key metrics of TiDB Cloud from the endpoint easily.

This document describes how to configure your Prometheus service to read key metrics from the TiDB Cloud Essential endpoint and how to view the metrics using Grafana.

Prerequisites

  • To integrate TiDB Cloud with Prometheus, you must have a self-hosted or managed Prometheus service.

  • To set up third-party metrics integration for TiDB Cloud, you must have the Organization Owner or Instance Manager access in TiDB Cloud. To view the integration page, you need at least the Project Viewer or Instance Viewer role to access the target TiDB Cloud Essential cluster under your Organization in TiDB Cloud.

Limitation

  • Prometheus and Grafana integrations are not available for TiDB Cloud Starter clusters.
  • Prometheus and Grafana integrations are not available when the cluster status is CREATING, RESTORING, PAUSED, or RESUMING.

Steps

Step 1. Get a scrape_config file for Prometheus

Before configuring your Prometheus service to read metrics of TiDB Cloud, you need to generate a scrape_config YAML file in TiDB Cloud first. The scrape_config file contains a unique bearer token that allows the Prometheus service to monitor your target cluster.

  1. In the TiDB Cloud console, navigate to the Clusters page, and then click the name of your target TiDB Cloud Essential cluster to go to its overview page.
  2. In the left navigation pane, click Integrations > Integration to Prometheus(Preview).
  3. Click Add File to generate and show the scrape_config file for the current TiDB Cloud Essential cluster.
  4. Make a copy of the scrape_config file content for later use.

Step 2. Integrate with Prometheus

  1. In the monitoring directory specified by your Prometheus service, locate the Prometheus configuration file.

    For example, /etc/prometheus/prometheus.yml.

  2. In the Prometheus configuration file, locate the scrape_configs section, and then copy the scrape_config file content obtained from TiDB Cloud to the section.

  3. In your Prometheus service, check Status > Targets to verify that the new scrape_config file has been read. If not, you might need to restart the Prometheus service.

Step 3. Use Grafana GUI dashboards to visualize the metrics

After your Prometheus service reads metrics from TiDB Cloud, you can use Grafana GUI dashboards to visualize the metrics as follows:

  1. Download the Grafana dashboard JSON file for TiDB Cloud Essential from the following link:

    https://github.com/pingcap/docs/blob/master/tidb-cloud/monitor-prometheus-and-grafana-integration-tidb-cloud-dynamic-tracker-essential.json

  2. Import this JSON to your own Grafana GUI to visualize the metrics.

  3. (Optional) Customize the dashboard as needed by adding or removing panels, changing data sources, and modifying display options.

For more information about how to use Grafana, see Grafana documentation.

Best practice for rotating scrape_config

To improve data security, periodically rotate scrape_config file bearer tokens.

  1. Follow Step 1 to create a new scrape_config file for Prometheus.
  2. Add the content of the new file to your Prometheus configuration file.
  3. Once you confirm that your Prometheus service can read from TiDB Cloud, remove the content of the old scrape_config file from your Prometheus configuration file.
  4. On the Integrations page of your cluster, delete the corresponding old scrape_config file to block anyone else from using it to read from the TiDB Cloud Prometheus endpoint.

Metrics available to Prometheus

Prometheus tracks the following metric data for your cluster.

Metric nameMetric typeLabelsDescription
tidbcloud_db_total_connectiongaugeinstance_id: <instance id>
instance_name: <instance name>
The number of current connections in your TiDB server
tidbcloud_db_active_connectionsgaugeinstance_id: <instance id>
instance_name: <instance name>
The number of active connections
tidbcloud_db_disconnectionsgaugeresult: Error\|...
instance_id: <instance id>
instance_name: <instance name>
The number of clients disconnected by connection result
tidbcloud_db_database_timegaugesql_type: Select\|Insert\|...
instance_id: <instance id>
instance_name: <instance name>
A time model statistic that represents the sum of all processes' CPU consumption plus the sum of non-idle wait time
tidbcloud_db_query_per_secondgaugetype: Select\|Insert\|...
instance_id: <instance id>
instance_name: <instance name>
The number of SQL statements executed per second, counted according to statement types
tidbcloud_db_failed_queriesgaugetype: planner:xxx\|executor:2345\|...
instance_id: <instance id>
instance_name: <instance name>
The statistics of error types (for example, syntax errors, primary key conflicts) occurred when executing SQL statements per second
tidbcloud_db_command_per_secondgaugetype: Query\|Ping\|...
instance_id: <instance id>
instance_name: <instance name>
The number of commands processed by TiDB per second
tidbcloud_db_queries_using_plan_cache_opsgaugeinstance_id: <instance id>
instance_name: <instance name>
The statistics of queries hitting the Execution Plan Cache per second
tidbcloud_db_average_query_durationgaugesql_type: Select\|Insert\|...
instance_id: <instance id>
instance_name: <instance name>
The duration between the time a network request is sent to TiDB and returned to the client
tidbcloud_db_transaction_per_secondgaugetype: Commit\|Rollback\|...
txn_mode: optimistic\|pessimistic
instance_id: <instance id>
instance_name: <instance name>
The number of transactions executed per second
tidbcloud_db_row_storage_used_bytesgaugeinstance_id: <instance id>
instance_name: <instance name>
The row-based storage size of the cluster in bytes
tidbcloud_db_columnar_storage_used_bytesgaugeinstance_id: <instance id>
instance_name: <instance name>
The columnar storage size of the cluster in bytes. Returns 0 if TiFlash is not enabled.
tidbcloud_resource_manager_resource_request_unit_totalgaugeinstance_id: <instance id>
instance_name: <instance name>
The total Request Units (RU) consumed.

FAQ

  • Why does the same metric have different values on Grafana and the TiDB Cloud console at the same time?

    Grafana and TiDB Cloud use different aggregation calculation logic, so the displayed aggregated values might differ. You can adjust the mini step configuration in Grafana to get more fine-grained metric values.

Was this page helpful?