Deploy Monitoring and Alerts for a TiDB Cluster

This document describes how to monitor a TiDB cluster deployed using TiDB Operator and how to configure alerts for the cluster.

Monitor the TiDB cluster

TiDB cluster monitoring consists of two parts: monitoring data and dashboards. You can collect metrics using open-source tools such as Prometheus or VictoriaMetrics, and display the metrics using Grafana.

Monitoring architecture of TiDB clusters

Collect monitoring data

Collect monitoring data using Prometheus

To collect monitoring data using Prometheus, perform the following steps:

Deploy Prometheus Operator in your Kubernetes cluster by following the Prometheus Operator official documentation. This document uses version v0.82.0 as an example.

Create a PodMonitor Custom Resource (CR) in the namespace of your TiDB cluster:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: tidb-cluster-pod-monitor
  namespace: ${tidb_cluster_namespace}
  labels:
    monitor: tidb-cluster
spec:
  jobLabel: "pingcap.com/component"
  namespaceSelector:
    matchNames:
      - ${tidb_cluster_namespace}
  selector:
    matchLabels:
      app.kubernetes.io/managed-by: tidb-operator
  podMetricsEndpoints:
    - interval: 15s
      # If TLS is enabled in the TiDB cluster, set the scheme to https. Otherwise, set it to http.
      scheme: https
      honorLabels: true
      # Configure tlsConfig only if TLS is enabled in the TiDB cluster.
      tlsConfig:
        ca:
          secret:
            name: db-cluster-client-secret
            key: ca.crt
        cert:
          secret:
            name: db-cluster-client-secret
            key: tls.crt
        keySecret:
          name: db-cluster-client-secret
          key: tls.key
      metricRelabelings:
        - action: labeldrop
          regex: container
      relabelings:
        - sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: "true"
        - sourceLabels:
            - __meta_kubernetes_pod_name
            - __meta_kubernetes_pod_label_app_kubernetes_io_instance
            - __meta_kubernetes_pod_label_app_kubernetes_io_component
            - __meta_kubernetes_namespace
            - __meta_kubernetes_pod_annotation_prometheus_io_port
          action: replace
          regex: (.+);(.+);(.+);(.+);(.+)
          replacement: $1.$2-$3-peer.$4:$5
          targetLabel: __address__
        - sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          targetLabel: __metrics_path__
        - sourceLabels: [__meta_kubernetes_namespace]
          targetLabel: kubernetes_namespace
        - sourceLabels: [__meta_kubernetes_pod_label_app_kubernetes_io_instance]
          targetLabel: cluster
        - sourceLabels: [__meta_kubernetes_pod_name]
          targetLabel: instance
        - sourceLabels: [__meta_kubernetes_pod_label_app_kubernetes_io_component]
          targetLabel: component
        - sourceLabels:
            - __meta_kubernetes_namespace
            - __meta_kubernetes_pod_label_app_kubernetes_io_instance
          separator: '-'
          targetLabel: tidb_cluster

Create a Prometheus CR to collect metrics. Follow the Prometheus Operator official documentation and make sure the appropriate permissions are granted to the ServiceAccount:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  namespace: monitoring
spec:
  serviceAccountName: prometheus
  externalLabels:
    k8s_cluster: ${your_k8s_cluster_name}
  podMonitorSelector:
    matchLabels:
      monitor: tidb-cluster
  # An empty podMonitorNamespaceSelector means PodMonitors in all namespaces are collected.
  podMonitorNamespaceSelector: {}

Execute the following kubectl port-forward command to access Prometheus through port forwarding:
```
kubectl port-forward -n monitoring prometheus-prometheus-0 9090:9090 &>/tmp/portforward-prometheus.log &
```
Then, you can access http://localhost:9090/targets in your browser view the monitoring data collection status.

Collect monitoring data using VictoriaMetrics

To collect monitoring data using VictoriaMetrics, perform the following steps:

Deploy VictoriaMetrics Operator in your Kubernetes cluster by following the VictoriaMetrics official documentation. This document uses version v0.58.1 as an example.

Create a VMSingle Custom Resource (CR) to store monitoring data:

apiVersion: victoriametrics.com/v1beta1
kind: VMSingle
metadata:
  name: demo
  namespace: monitoring

Create a VMAgent CR to collect monitoring data:

apiVersion: victoriametrics.com/v1beta1
kind: VMAgent
metadata:
  name: demo
  namespace: monitoring
spec:
  # Configure remoteWrite to write collected monitoring metrics to VMSingle.
  remoteWrite:
    - url: "http://vmsingle-demo.monitoring.svc:8429/api/v1/write"
  externalLabels:
    k8s_cluster: ${your_k8s_cluster_name}
  selectAllByDefault: true

Create a VMPodScrape CR in the TiDB cluster namespace to discover Pods and generate scrape configs for VMAgent:

apiVersion: victoriametrics.com/v1beta1
kind: VMPodScrape
metadata:
  name: tidb-cluster-pod-scrape
  namespace: ${tidb_cluster_namespace}
spec:
  jobLabel: "pingcap.com/component"
  namespaceSelector:
    matchNames:
      - ${tidb_cluster_namespace}
  selector:
    matchLabels:
      app.kubernetes.io/managed-by: tidb-operator
  podMetricsEndpoints:
    - interval: 15s
      # If TLS is enabled in the TiDB cluster, set the scheme to https. Otherwise, set it to http.
      scheme: https
      honorLabels: true
      # Configure tlsConfig only if TLS is enabled in the TiDB cluster.
      tlsConfig:
        ca:
          secret:
            name: db-cluster-client-secret
            key: ca.crt
        cert:
          secret:
            name: db-cluster-client-secret
            key: tls.crt
        keySecret:
          name: db-cluster-client-secret
          key: tls.key
      metricRelabelConfigs:
        - action: labeldrop
          regex: container
      relabelConfigs:
        - sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: "true"
        - sourceLabels:
            - __meta_kubernetes_pod_name
            - __meta_kubernetes_pod_label_app_kubernetes_io_instance
            - __meta_kubernetes_pod_label_app_kubernetes_io_component
            - __meta_kubernetes_namespace
            - __meta_kubernetes_pod_annotation_prometheus_io_port
          action: replace
          regex: (.+);(.+);(.+);(.+);(.+)
          replacement: $1.$2-$3-peer.$4:$5
          targetLabel: __address__
        - sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          targetLabel: __metrics_path__
        - sourceLabels: [__meta_kubernetes_namespace]
          targetLabel: kubernetes_namespace
        - sourceLabels: [__meta_kubernetes_pod_label_app_kubernetes_io_instance]
          targetLabel: cluster
        - sourceLabels: [__meta_kubernetes_pod_name]
          targetLabel: instance
        - sourceLabels: [__meta_kubernetes_pod_label_app_kubernetes_io_component]
          targetLabel: component
        - sourceLabels:
            - __meta_kubernetes_namespace
            - __meta_kubernetes_pod_label_app_kubernetes_io_instance
          separator: '-'
          targetLabel: tidb_cluster

Execute the following kubectl port-forward command to access VMAgent through port forwarding:
```
kubectl port-forward -n monitoring svc/vmagent-demo 8429:8429 &>/tmp/portforward-vmagent.log &
```
Then, you can access http://localhost:8429/targets in your browser view the monitoring data collection status.

Configure monitoring dashboards

To configure the monitoring dashboard, perform the following steps:

Follow the Grafana official documentation to deploy Grafana. This document uses version 12.0.0-security-01 as an example.

Execute the following kubectl port-forward command to access Grafana through port forwarding:

kubectl port-forward -n ${namespace} ${grafana_pod_name} 3000:3000 &>/tmp/portforward-grafana.log &

Then, you can access http://localhost:3000 in your browser. The default username and password are both admin. If you install Grafana using Helm, execute the following command to get the password of admin:
```
kubectl get secret --namespace ${namespace} ${grafana_secret_name} -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
```
Add a data source of type Prometheus in Grafana and set the Prometheus Server URL based on your monitoring setup:
- For Prometheus, set the URL to http://prometheus-operated.monitoring.svc:9090.
- For VictoriaMetrics, set the URL to http://vmsingle-demo.monitoring.svc:8429.
Download Grafana dashboards for TiDB components using the get-grafana-dashboards.sh script and import them manually into Grafana.

Configure alerts

You can manage and send alerts using Alertmanager. For specific deployment and configuration steps, refer to the Alertmanager official documentation.

Monitor multiple clusters using Grafana

To monitor multiple clusters in Grafana, perform the following steps:

In the Grafana dashboard, click Dashboard settings to open the Settings page.
On the Settings page, select the tidb_cluster variable under Variables, and set the Hide property of the tidb_cluster variable to empty.
Return to the dashboard. You will see a cluster selector dropdown. Each option follows the ${namespace}-${tidb_cluster_name} format.
Click Save dashboard to apply the changes.