Aggregate Monitoring Data of Multiple TiDB Clusters
This document describes how to aggregate the monitoring data of multiple TiDB clusters by Thanos to provide centralized monitoring service.
Thanos
Thanos is a high availability solution for Prometheus that simplifies the availability guarantee of Prometheus.
Thanos provides Thanos Query component as a unified query solution across multiple Prometheus clusters. You can use this feature to aggregate monitoring data of multiple TiDB clusters.
Aggregate monitoring data via Thanos Query
Configure Thanos Query
Configure a Thanos Sidecar container for each TidbMonitor.
A configuration example is as follows.
kubectl -n ${namespace} apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.3.10/examples/monitor-with-thanos/tidb-monitor.yamlDeploy the Thanos Query component.
Download the
thanos-query.yaml
file for Thanos Query deployment:curl -sl -O https://raw.githubusercontent.com/pingcap/tidb-operator/v1.3.10/examples/monitor-with-thanos/thanos-query.yamlManually modify the
--store
parameter in thethanos-query.yaml
file by updatingbasic-prometheus:10901
tobasic-prometheus.${namespace}:10901
.${namespace}
is the namespace where TidbMonitor is deployed.Execute the
kubectl apply
command for deployment.kubectl -n ${thanos_namespace} apply -f thanos-query.yamlIn the command above,
${thanos_namespace}
is the namespace where the Thanos Query component is deployed.
In Thanos Query, a Prometheus instance corresponds to a store and also corresponds to a TidbMonitor. After deploying Thanos Query, you can provide a uniform query interface for monitoring data through Thanos Query's API.
Access the Thanos Query panel
To access the Thanos Query panel, execute the following command, and then access http://127.0.0.1:9090 in your browser:
kubectl port-forward -n ${thanos_namespace} svc/thanos-query 9090
If you want to access the Thanos Query panel using NodePort or LoadBalancer, refer to the following documents:
Configure Grafana
After deploying Thanos Query, to query the monitoring data of multiple TidbMonitors, take the following steps:
- Log in to Grafana.
- In the left navigation bar, select
Configuration
>Data Sources
. - Add or modify a DataSource in the Prometheus type.
- Set the URL under HTTP to
http://thanos-query.${thanos_namespace}:9090
.
Add or remove TidbMonitor
In Thanos Query, a Prometheus instance corresponds to a monitor store and also corresponds to a TidbMonitor. If you need to add, update, or remove a monitor store from the Thanos Query, update the --store
configuration of the Thanos Query component, and perform a rolling update to the Thanos Query component.
spec:
containers:
- args:
- query
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:9090
- --log.level=debug
- --log.format=logfmt
- --query.replica-label=prometheus_replica
- --query.replica-label=rule_replica
- --store=<TidbMonitorName1>-prometheus.<TidbMonitorNs1>:10901
- --store=<TidbMonitorName2>-prometheus.<TidbMonitorNs2>:10901
Configure archives and storage of Thanos Sidecar
Thanos Sidecar supports replicating monitoring data to S3 remote storage.
The configuration of the TidbMonitor
CR is as follows:
spec:
thanos:
baseImage: thanosio/thanos
version: v0.17.2
objectStorageConfig:
key: objectstorage.yaml
name: thanos-objectstorage
Meanwhile, you need to create a Secret. The example is as follows:
apiVersion: v1
kind: Secret
metadata:
name: thanos-objectstorage
type: Opaque
stringData:
objectstorage.yaml: |
type: S3
config:
bucket: "xxxxxx"
endpoint: "xxxx"
region: ""
access_key: "xxxx"
insecure: true
signature_version2: true
secret_key: "xxxx"
put_user_metadata: {}
http_config:
idle_conn_timeout: 90s
response_header_timeout: 2m
trace:
enable: true
part_size: 41943040
RemoteWrite mode
Besides aggregating data via Thanos Query, you can also push monitoring data to Thanos using Prometheus' RemoteWrite feature.
To enable the RemoteWrite mode, specify the Prometheus RemoteWrite configuration when you create the TidbMonitor CR. For example:
apiVersion: pingcap.com/v1alpha1
kind: TidbMonitor
metadata:
name: basic
spec:
clusters:
- name: basic
prometheus:
baseImage: prom/prometheus
version: v2.27.1
remoteWrite:
- url: "http://thanos-receiver:19291/api/v1/receive"
grafana:
baseImage: grafana/grafana
version: 7.5.11
initializer:
baseImage: registry.cn-beijing.aliyuncs.com/tidb/tidb-monitor-initializer
version: v6.1.0
reloader:
baseImage: registry.cn-beijing.aliyuncs.com/tidb/tidb-monitor-reloader
version: v1.0.1
prometheusReloader:
baseImage: quay.io/prometheus-operator/prometheus-config-reloader
version: v0.49.0
imagePullPolicy: IfNotPresent
After RemoteWrite is enabled, Prometheus pushes the monitoring data to Thanos Receiver. For more information, refer to the design of Thanos Receiver.
For details on the deployment, refer to this example of integrating TidbMonitor with Thanos Receiver.