Doc Menu

Handle Alerts

This document introduces how to deal with the alert information in DM.

DM_worker_offline

  • Description:

    If a DM-worker node is offline for more than one hour, this alert is triggered. In a high-availability architecture, this alert might not directly interrupt the task but increases the risk of interruption.

  • Solution:

    You can take the following steps to handle the alert:

    1. View the working status of the corresponding DM-worker node.
    2. Check whether the node is connected.
    3. Troubleshoot errors through logs.

DM_DDL_error

  • Description:

    This error occurs when DM is processing the sharding DDL operations.

  • Solution:

    Refer to Troubleshoot DM.

DM_pending_DDL

  • Description:

    If a sharding DDL operation is pending for more than one hour, this alert is triggered.

  • Solution:

    In some scenarios, the pending sharding DDL operation might be what users expect. Otherwise, refer to Handle Sharding DDL Locks Manually in DM for solution.

DM_task_state

  • Description:

    When a sub-task of DM-worker is in the Paused state for over 20 minutes, an alert is triggered.

  • Solution:

    Refer to Troubleshoot DM.

Note:

Currently, DM v2.0 does not support enabling the relay log feature.

DM_relay_process_exits_with_error

  • Description:

    When the relay log processing unit encounters an error, this unit moves to Paused state, and an alert is triggered immediately.

  • Solution:

    Refer to Troubleshoot DM.

DM_remain_storage_of_relay_log

  • Description:

    When the free space of the disk where the relay log is located is less than 10G, an alert is triggered.

  • Solutions:

    You can take the following methods to handle the alert:

DM_relay_log_data_corruption

  • Description:

    When the relay log processing unit validates the binlog event read from the upstream and detects abnormal checksum information, this unit moves to the Paused state, and an alert is triggered immediately.

  • Solution:

    Refer to Troubleshoot DM.

DM_fail_to_read_binlog_from_master

  • Description:

    If an error occurs when the relay log processing unit tries to read the binlog event from the upstream, this unit moves to the Paused state, and an alert is triggered immediately.

  • Solution: Refer to Troubleshoot DM.

DM_fail_to_write_relay_log

  • Description:

    If an error occurs when the relay log processing unit tries to write the binlog event into the relay log file, this unit moves to the Paused state, and an alert is triggered immediately.

  • Solution:

    Refer to Troubleshoot DM.

DM_binlog_file_gap_between_master_relay

  • Description:

    When the number of the binlog files in the current upstream MySQL/MariaDB exceeds that of the latest binlog files pulled by the relay log processing unit by more than 1 for 10 minutes, and an alert is triggered.

  • Solution:

    Refer to Troubleshoot DM.

DM_dump_process_exists_with_error

  • Description:

    When the Dump processing unit encounters an error, this unit moves to the Paused state, and an alert is triggered immediately.

  • Solution:

    Refer to Troubleshoot DM.

DM_load_process_exists_with_error

  • Description:

    When the Load processing unit encounters an error, this unit moves to the Paused state, and an alert is triggered immediately.

  • Solution:

    Refer to Troubleshoot DM.

DM_sync_process_exists_with_error

  • Description:

    When the binlog replication processing unit encounters an error, this unit moves to the Paused state, and an alert is triggered immediately.

  • Solution:

    Refer to Troubleshoot DM.

DM_binlog_file_gap_between_master_syncer

  • Description:

    When the number of the binlog files in the current upstream MySQL/MariaDB exceeds that of the latest binlog files processed by the relay log processing unit by more than 1 for 10 minutes, an alert is triggered.

  • Solution:

    Refer to Handle Performance Issues.

DM_binlog_file_gap_between_relay_syncer

  • Description:

    When the number of the binlog files in the current relay log processing unit exceeds that of the latest binlog files processed by the binlog replication processing unit by more than 1 for 10 minutes, an alert is triggered.

  • Solution:

    Refer to Handle Performance Issues.