Deploy Monitoring and Alerts for a TiDB Cluster
This document describes how to monitor a TiDB cluster deployed using TiDB Operator and configure alerts for the cluster.
Monitor the TiDB cluster
You can monitor the TiDB cluster with Prometheus and Grafana. When you create a new TiDB cluster using TiDB Operator, you can deploy a separate monitoring system for the TiDB cluster. The monitoring system must run in the same namespace as the TiDB cluster, and includes two components: Prometheus and Grafana.
For configuration details on the monitoring system, refer to TiDB Cluster Monitoring.
In TiDB Operator v1.1 or later versions, you can monitor a TiDB cluster on a Kubernetes cluster by using a simple Custom Resource (CR) file called TidbMonitor
.
Persist monitoring data
The monitoring data is not persisted by default. To persist the monitoring data, you can set spec.persistent
to true
in TidbMonitor
. When you enable this option, you need to set spec.storageClassName
to an existing storage in the current cluster. This storage must support persisting data; otherwise, there is a risk of data loss.
A configuration example is as follows:
apiVersion: pingcap.com/v1alpha1
kind: TidbMonitor
metadata:
name: basic
spec:
clusters:
- name: basic
persistent: true
storageClassName: ${storageClassName}
storage: 5G
prometheus:
baseImage: prom/prometheus
version: v2.27.1
service:
type: NodePort
grafana:
baseImage: grafana/grafana
version: 7.5.11
service:
type: NodePort
initializer:
baseImage: pingcap/tidb-monitor-initializer
version: v8.1.0
reloader:
baseImage: pingcap/tidb-monitor-reloader
version: v1.0.1
prometheusReloader:
baseImage: quay.io/prometheus-operator/prometheus-config-reloader
version: v0.49.0
imagePullPolicy: IfNotPresent
To verify the PVC status, run the following command:
kubectl get pvc -l app.kubernetes.io/instance=basic,app.kubernetes.io/component=monitor -n ${namespace}
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
basic-monitor Bound pvc-6db79253-cc9e-4730-bbba-ba987c29db6f 5G RWO standard 51s
Customize the Prometheus configuration
You can customize the Prometheus configuration by using a customized configuration file or by adding extra options to the command.
Use a customized configuration file
- Create a ConfigMap for your customized configuration, and set the key name of
data
toprometheus-config
. - Set
spec.prometheus.config.configMapRef.name
andspec.prometheus.config.configMapRef.namespace
to the name and namespace of the customized ConfigMap respectively. - Check if TidbMonitor has enabled dynamic configuration. If not, you need to restart TidbMonitor's pod to reload the configuration.
For the complete configuration, refer to the tidb-operator example.
Add extra options to the command
To add extra options to the command that starts Prometheus, configure spec.prometheus.config.commandOptions
.
For the complete configuration, refer to the tidb-operator example.
Access the Grafana monitoring dashboard
You can run the kubectl port-forward
command to access the Grafana monitoring dashboard:
kubectl port-forward -n ${namespace} svc/${cluster_name}-grafana 3000:3000 &>/tmp/portforward-grafana.log &
Then open http://localhost:3000 in your browser and log on with the default username and password admin
.
You can also set spec.grafana.service.type
to NodePort
or LoadBalancer
, and then view the monitoring dashboard through NodePort
or LoadBalancer
.
If there is no need to use Grafana, you can delete the part of spec.grafana
in TidbMonitor
during deployment. In this case, you need to use other existing or newly deployed data visualization tools to directly access the monitoring data.
Access the Prometheus monitoring data
To access the monitoring data directly, run the kubectl port-forward
command to access Prometheus:
kubectl port-forward -n ${namespace} svc/${cluster_name}-prometheus 9090:9090 &>/tmp/portforward-prometheus.log &
Then open http://localhost:9090 in your browser or access this address via a client tool.
You can also set spec.prometheus.service.type
to NodePort
or LoadBalancer
, and then view the monitoring data through NodePort
or LoadBalancer
.
Set kube-prometheus and AlertManager
Nodes-Info and Pods-Info monitoring dashboards are built into TidbMonitor Grafana by default to view the corresponding monitoring metrics of Kubernetes.
To view these monitoring metrics in TidbMonitor Grafana, take the following steps:
Deploy Kubernetes cluster monitoring manually.
There are multiple ways to deploy Kubernetes cluster monitoring. To use kube-prometheus for deployment, see the kube-prometheus documentation.
Set the
TidbMonitor.spec.kubePrometheusURL
to obtain Kubernetes monitoring data.
Similarly, you can configure TidbMonitor to push the monitoring alert to AlertManager.
apiVersion: pingcap.com/v1alpha1
kind: TidbMonitor
metadata:
name: basic
spec:
clusters:
- name: basic
kubePrometheusURL: http://prometheus-k8s.monitoring:9090
alertmanagerURL: alertmanager-main.monitoring:9093
prometheus:
baseImage: prom/prometheus
version: v2.27.1
service:
type: NodePort
grafana:
baseImage: grafana/grafana
version: 7.5.11
service:
type: NodePort
initializer:
baseImage: pingcap/tidb-monitor-initializer
version: v8.1.0
reloader:
baseImage: pingcap/tidb-monitor-reloader
version: v1.0.1
prometheusReloader:
baseImage: quay.io/prometheus-operator/prometheus-config-reloader
version: v0.49.0
imagePullPolicy: IfNotPresent
Enable Ingress
This section introduces how to enable Ingress for TidbMonitor. Ingress is an API object that exposes HTTP and HTTPS routes from outside the cluster to services within the cluster.
Prerequisites
Before using Ingress, you need to install the Ingress controller. Simply creating the Ingress resource does not take effect.
You need to deploy the NGINX Ingress controller, or choose from various Ingress controllers.
For more information, see Ingress Prerequisites.
Access TidbMonitor using Ingress
Currently, TidbMonitor provides a method to expose the Prometheus/Grafana service through Ingress. For details about Ingress, see Ingress official documentation.
The following example shows how to enable Prometheus and Grafana in TidbMonitor:
apiVersion: pingcap.com/v1alpha1
kind: TidbMonitor
metadata:
name: ingress-demo
spec:
clusters:
- name: demo
persistent: false
prometheus:
baseImage: prom/prometheus
version: v2.27.1
ingress:
hosts:
- example.com
annotations:
foo: "bar"
grafana:
baseImage: grafana/grafana
version: 7.5.11
service:
type: ClusterIP
ingress:
hosts:
- example.com
annotations:
foo: "bar"
initializer:
baseImage: pingcap/tidb-monitor-initializer
version: v8.1.0
reloader:
baseImage: pingcap/tidb-monitor-reloader
version: v1.0.1
prometheusReloader:
baseImage: quay.io/prometheus-operator/prometheus-config-reloader
version: v0.49.0
imagePullPolicy: IfNotPresent
To modify the setting of Ingress Annotations, configure spec.prometheus.ingress.annotations
and spec.grafana.ingress.annotations
. If you use the default NGINX Ingress, see NGINX Ingress Controller Annotation for details.
The TidbMonitor Ingress setting also supports TLS. The following example shows how to configure TLS for Ingress. See Ingress TLS for details.
apiVersion: pingcap.com/v1alpha1
kind: TidbMonitor
metadata:
name: ingress-demo
spec:
clusters:
- name: demo
persistent: false
prometheus:
baseImage: prom/prometheus
version: v2.27.1
ingress:
hosts:
- example.com
tls:
- hosts:
- example.com
secretName: testsecret-tls
grafana:
baseImage: grafana/grafana
version: 7.5.11
service:
type: ClusterIP
initializer:
baseImage: pingcap/tidb-monitor-initializer
version: v8.1.0
reloader:
baseImage: pingcap/tidb-monitor-reloader
version: v1.0.1
prometheusReloader:
baseImage: quay.io/prometheus-operator/prometheus-config-reloader
version: v0.49.0
imagePullPolicy: IfNotPresent
TLS Secret must include the tls.crt
and tls.key
keys, which include the certificate and private key used for TLS. For example:
apiVersion: v1
kind: Secret
metadata:
name: testsecret-tls
namespace: ${namespace}
data:
tls.crt: base64 encoded cert
tls.key: base64 encoded key
type: kubernetes.io/tls
In a public cloud-deployed Kubernetes cluster, you can usually configure Loadbalancer to access Ingress through a domain name. If you cannot configure the Loadbalancer service (for example, when you use NodePort as the service type of Ingress), you can access the service in a way equivalent to the following command:
curl -H "Host: example.com" ${node_ip}:${NodePort}
Configure alert
When Prometheus is deployed with a TiDB cluster, some default alert rules are automatically imported. You can view all alert rules and statuses in the current system by accessing the Alerts page of Prometheus through a browser.
The custom configuration of alert rules is supported. You can modify the alert rules by taking the following steps:
- When deploying the monitoring system for the TiDB cluster, set
spec.reloader.service.type
toNodePort
orLoadBalancer
. - Access the
reloader
service throughNodePort
orLoadBalancer
. Click theFiles
button above to select the alert rule file to be modified, and make the custom configuration. ClickSave
after the modification.
The default Prometheus and alert configuration do not support sending alert messages. To send an alert message, you can integrate Prometheus with any tool that supports Prometheus alerts. It is recommended to manage and send alert messages via AlertManager.
If you already have an available AlertManager service in your existing infrastructure, you can set the value of
spec.alertmanagerURL
to the address ofAlertManager
, which will be used by Prometheus. For details, refer to Set kube-prometheus and AlertManager.If no AlertManager service is available, or if you want to deploy a separate AlertManager service, refer to the Prometheus official document.
Monitor multiple clusters
Starting from TiDB Operator 1.2, TidbMonitor supports monitoring multiple clusters across namespaces.
Configure the monitoring of multiple clusters using YAML files
For the clusters to be monitored, regardless of whether TLS
is enabled or not, you can monitor them by configuring TidbMonitor's YAML file.
A configuration example is as follows:
apiVersion: pingcap.com/v1alpha1
kind: TidbMonitor
metadata:
name: basic
spec:
clusterScoped: true
clusters:
- name: ns1
namespace: ns1
- name: ns2
namespace: ns2
persistent: true
storage: 5G
prometheus:
baseImage: prom/prometheus
version: v2.27.1
service:
type: NodePort
grafana:
baseImage: grafana/grafana
version: 7.5.11
service:
type: NodePort
initializer:
baseImage: pingcap/tidb-monitor-initializer
version: v8.1.0
reloader:
baseImage: pingcap/tidb-monitor-reloader
version: v1.0.1
prometheusReloader:
baseImage: quay.io/prometheus-operator/prometheus-config-reloader
version: v0.49.0
imagePullPolicy: IfNotPresent
For a complete configuration example, refer to Example in the TiDB Operator repository.
Monitor multiple clusters using Grafana
If the tidb-monitor-initializer
image is earlier than v4.0.14 or v5.0.3, to monitor multiple clusters, you can take the following steps in each Grafana Dashboard:
- On Grafana Dashboard, click Dashboard settings to open the Settings panel.
- On the Settings panel, select the tidb_cluster variable from Variables, and then set the Hide property of the tidb_cluster variable to the null option in the drop-down list.
- Get back to the current Grafana Dashboard (changes to the Hide property cannot be saved currently), and you can see the drop-down list for cluster selection. The cluster name in the drop-down list is in the
${namespace}-${name}
format.
If you need to save changes to the Grafana Dashboard, Grafana must be 6.5
or later, and TiDB Operator must be v1.2.0-rc.2 or later.