TiDB Cloud Built-in Alerting

TiDB Cloud provides you with an easy way to view alerts, edit alert rules, and subscribe to alert notification emails.

This document describes how to do these operations and provides the TiDB Cloud built-in alert conditions for your reference.

View alerts

In TiDB Cloud, you can view both active and closed alerts on the Alerts page.

  1. In the TiDB Cloud console, navigate to the Clusters page of your project.

  2. Click the name of the target cluster. The cluster overview page is displayed.

  3. Click Alerts in the left navigation pane.

  4. The Alerts page displays the active alerts by default. You can view the information of each active alert such as the alert name, trigger time, and duration.

  5. If you also want to view the closed alerts, just click the Status drop-down list and select Closed or All.

Edit alert rules

In TiDB Cloud, you can edit the alert rules by disabling or enabling the alerts or updating the alert threshold.

  1. On the Alerts page, click Edit Rules.

  2. Disable or enable alert rules as needed.

  3. Click Edit to update the threshold of an alert rule.

Subscribe to alert notification emails

To get alert notification emails of clusters in your project, take the following steps:

  1. On the Alerts page , click Subscribe Alerts.

  2. Enter your email address, and then click Subscribe.

Alternatively, you can also add the subscription from the Alert Subscription page as follows:

  1. Log in to the TiDB Cloud console.
  2. Click in the lower-left corner, switch to the target project if you have multiple projects, and then click Project Settings.
  3. On the Project Settings page of your project, click Alert Subscription in the left navigation pane.
  4. Click Add Subscriber, enter your email address in the displayed dialog, and then click Add.

If an alert condition remains unchanged, the alert sends email notifications every 3 hours.

Unsubscribe from alert notification emails

If you no longer want to receive alert notification emails of clusters in your project, take the following steps:

  1. Log in to the TiDB Cloud console.
  2. Click in the lower-left corner, switch to the target project if you have multiple projects, and then click Project Settings.
  3. On the Project Settings page of your project, click Alert Subscription in the left navigation pane.
  4. Locate your email address and click Unsubscribe.
  5. Click Delete to confirm the unsubscription.

TiDB Cloud built-in alert conditions

The following table provides the TiDB Cloud built-in alert conditions and the corresponding recommended actions.

Resource usage alerts

ConditionRecommended Action
Total TiDB node memory utilization across cluster exceeded 70% for 10 minutesConsider increasing the node number or node size for TiDB to reduce the memory usage percentage of the current workload.
Total TiKV node memory utilization across cluster exceeded 70% for 10 minutesConsider increasing the node number or node size for TiKV to reduce the memory usage percentage of the current workload.
Total TiFlash node memory utilization across cluster exceeded 70% for 10 minutesConsider increasing the node number or node size for TiFlash to reduce the memory usage percentage of the current workload.
Total TiDB node CPU utilization exceeded 80% for 10 minutesConsider increasing the node number or node size for TiDB to reduce the CPU usage percentage of the current workload.
Total TiKV node CPU utilization exceeded 80% for 10 minutesConsider increasing the node number or node size for TiKV to reduce the CPU usage percentage of the current workload.
Total TiFlash node CPU utilization exceeded 80% for 10 minutesConsider increasing the node number or node size for TiFlash to reduce the CPU usage percentage of the current workload.
TiKV storage utilization exceeds 80%Consider increasing the node number or node storage size for TiKV to increase your storage capacity.
TiFlash storage utilization exceeds 80%Consider increasing the node number or node storage size for TiFlash to increase your storage capacity.
Max memory utilization across TiDB nodes exceeded 70% for 10 minutesConsider checking if there is any hotspot in the cluster or increasing the node number or node size for TiDB to reduce the memory usage percentage of the current workload.
Max memory utilization across TiKV nodes exceeded 70% for 10 minutesConsider checking if there is any hotspot in the cluster or increasing the node number or node size for TiKV to reduce the memory usage percentage of the current workload.
Max CPU utilization across TiDB nodes exceeded 80% for 10 minutesConsider checking if there is any hotspot in the cluster or increasing the node number or node size for TiDB to reduce the CPU usage percentage of the current workload.
Max CPU utilization across TiKV nodes exceeded 80% for 10 minutesConsider checking if there is any hotspot in the cluster or increasing the node number or node size for TiKV to reduce the CPU usage percentage of the current workload.

Data migration alerts

ConditionRecommended Action
Data migration job met error during data exportCheck the error and see Troubleshoot data migration for help.
Data migration job met error during data importCheck the error and see Troubleshoot data migration for help.
Data migration job met error during incremental migrationCheck the error and see Troubleshoot data migration for help.
Data migration job has been paused for more than 6 hours during incremental migrationData migration job has been paused for more than 6 hours during data incremental migration. The binlog in the upstream database might be purged (depending on your database binlog purge strategy) and might cause incremental migration to fail. See Troubleshoot data migration for help.
Replication lag is larger than 10 minutes and still increasing for more than 20 minutesSee Troubleshoot data migration for help.

Changefeed alerts

ConditionRecommended Action
The changefeed latency exceeds 600 seconds.Check the changefeed status on the Changefeed page and Changefeed Detail page of the TiDB Cloud console, where you can find some error messages to help diagnose this issue.
Possible reasons that can trigger this alert include:
  • The overall traffic in the upstream has increased, causing the existing changefeed specification to be insufficient to handle it. If the traffic increase is temporary, the changefeed latency will automatically recover after the traffic returns to normal. If the traffic increase is continuous, you need to scale up the changefeed.
  • The downstream or network is abnormal. In this case, resolve this abnormality first.
  • Tables lack indexes if the downstream is RDS, which might cause low write performance and high latency. In this case, you need to add the necessary indexes to the upstream or downstream.
If the problem cannot be fixed from your side, you can contact TiDB Cloud Support for further assistance.
The changefeed status is FAILED.Check the changefeed status on the Changefeed page and Changefeed Detail page of the TiDB Cloud console, where you can find some error messages to help diagnose this issue.
If the problem cannot be fixed from your side, you can contact TiDB Cloud Support for further assistance.
The changefeed status is WARNING.Check the changefeed status on the Changefeed page and Changefeed Detail page of the TiDB Cloud console, where you can find some error messages to help diagnose this issue.
If the problem cannot be fixed from your side, you can contact TiDB Cloud Support for further assistance.

Was this page helpful?