Sign InTry Free

Migrate from a MySQL-compatible Database - Taking Amazon Aurora MySQL as an Example

This document describes how to migrate from Amazon Aurora MySQL to TiDB by using TiDB Data Migration (DM).

The information of the Aurora cluster in the example is as follows:

ClusterEndpointPortRoleVersion
Aurora-1test-dm-2-0.cluster-czrtqco96yc6.us-east-2.rds.amazonaws.com3306WriterAurora (MySQL)-5.7.12
Aurora-1test-dm-2-0.cluster-ro-czrtqco96yc6.us-east-2.rds.amazonaws.com3306ReaderAurora (MySQL)-5.7.12
Aurora-2test-dm-2-0-2.cluster-czrtqco96yc6.us-east-2.rds.amazonaws.com3306WriterAurora (MySQL)-5.7.12
Aurora-2test-dm-2-0-2.cluster-ro-czrtqco96yc6.us-east-2.rds.amazonaws.com3306ReaderAurora (MySQL)-5.7.12

The data and migration plan of the Aurora cluster are as follows:

ClusterDatabaseTableMigration
Aurora-1migrate_met1Yes
Aurora-1ignore_meignore_tableNo
Aurora-2migrate_met2Yes
Aurora-2ignore_meignore_tableNo

The Aurora users in this migration are as follows:

ClusterUserPassword
Aurora-1root12345678
Aurora-2root12345678

The TiDB cluster information in the example is as follows. The TiDB cluster is deployed using TiDB Cloud.

NodePortVersion
tidb.6657c286.23110bc6.us-east-1.prod.aws.tidbcloud.com4000v4.0.2

The TiDB users in this migration are as follows:

UserPassword
root87654321

After migration, the `migrate_me`.`t1` and `migrate_me`.`t2` tables are expected to exist in the TiDB cluster. The data of these tables is consistent with that of the Aurora cluster.

Step 1: Precheck

To ensure a successful migration, you need to do prechecks before starting the migration. This section provides the precheck list and solutions to DM and Aurora components.

DM nodes deployment

As the hub of data migration, DM needs to connect to the upstream Aurora cluster and the downstream TiDB cluster. Therefore, you need to use the MySQL client to check whether the nodes in which DM is to be deployed can connect to the upstream and downstream. In addition, for details of DM requirements on hardware, software, and the node number, see DM Cluster Software and Hardware Recommendations.

Aurora

DM relies on the ROW-formatted binlog for incremental replication. See Enable binary for an Aurora Cluster for the configuration instruction.

If GTID is enabled in Aurora, you can migrate data based on GTID. For how to enable it, see Configuring GTID-Based Replication for an Aurora MySQL Cluster. To migrate data based on GTID, you need to set enable-gtid to true in the configuration file of data source in step 3.

Step 2: Deploy the DM cluster

DM can be deployed in multiple ways. Currently, it is recommended to use TiUP to deploy a DM cluster. For the specific deployment method, see Deploy DM cluster using TiUP. This example has two data sources, so at least two DM-worker nodes need to be deployed.

After deployment, you need to record the IP and service port of any DM-master node (8261 by default) for dmctl to connect. This example uses 127.0.0.1:8261. Check the DM status through TiUP using dmctl:

tiup dmctl --master-addr 127.0.0.1:8261 list-member

The number of masters and workers in the returned result is consistent with the number of deployed nodes:

{
    "result": true,
    "msg": "",
    "members": [
        {
            "leader": {
                ...
            }
        },
        {
            "master": {
                "msg": "",
                "masters": [
                    ...
                ]
            }
        },
        {
            "worker": {
                "msg": "",
                "workers": [
                    ...
                ]
            }
        }
    ]
}

Step 3: Configure the data source

Save the following configuration files of data source according to the example, in which the value of source-id will be used in the task configuration in step 4.

The content of source1.yaml:

# Aurora-1
source-id: "aurora-replica-01"

# To migrate data based on GTID, you need to set this item to true.
enable-gtid: false

from:
  host: "test-dm-2-0.cluster-czrtqco96yc6.us-east-2.rds.amazonaws.com"
  user: "root"
  password: "12345678"
  port: 3306

The content of source2.yaml:

# Aurora-2
source-id: "aurora-replica-02"

enable-gtid: false

from:
  host: "test-dm-2-0-2.cluster-czrtqco96yc6.us-east-2.rds.amazonaws.com"
  user: "root"
  password: "12345678"
  port: 3306

See Migrate Data Using Data Migration - Create Data Source, and use dmctl to add two data sources through TiUP.

tiup dmctl --master-addr 127.0.0.1:8261 operate-source create dm-test/source1.yaml
tiup dmctl --master-addr 127.0.0.1:8261 operate-source create dm-test/source2.yaml

When the data sources are successfully added, the return information of each data source includes a DM-worker bound to it.

{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "source": "aurora-replica-01",
            "worker": "one-dm-worker-ID"
        }
    ]
}

Step 4: Configure the task

This example migrates the existing data in Aurora and replicates incremental data to TiDB in real time, which is the full data migration plus incremental replication mode. According to the TiDB cluster information above, the added source-id, and the table to be migrated, save the following task configuration file task.yaml:

# The task name. You need to use a different name for each of the multiple tasks that run simultaneously.
name: "test"
# The full data migration plus incremental replication task mode.
task-mode: "all"
# The downstream TiDB configuration information.
target-database:
  host: "tidb.6657c286.23110bc6.us-east-1.prod.aws.tidbcloud.com"
  port: 4000
  user: "root"
  password: "87654321"

# Configuration of all the upstream MySQL instances required by the current data migration task.
mysql-instances:
- source-id: "aurora-replica-01"
  # The configuration items of the block and allow lists of the schema or table to be migrated, used to quote the global block and allow lists configuration. For global configuration, see the `block-allow-list` below.
  block-allow-list: "global"
  mydumper-config-name: "global"

- source-id: "aurora-replica-02"
  block-allow-list: "global"
  mydumper-config-name: "global"

# The configuration of block and allow lists.
block-allow-list:
  global:                             # Quoted by block-allow-list: "global" above
    do-dbs: ["migrate_me"]            # The allow list of the upstream table to be migrated. Database tables that are not in the allow list will not be migrated.

# The configuration of the dump unit.
mydumpers:
   global:                            # Quoted by mydumper-config-name: "global" above
    extra-args: "--consistency none"  # Aurora does not support FTWRL, you need to configure this option to bypass FTWRL.

Step 5: Start the task

Start the task using dmctl through TiUP.

tiup dmctl --master-addr 127.0.0.1:8261 start-task /absolute/path/to/task.yaml --remove-meta

If the task is successfully started, the following information is returned:

{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "source": "aurora-replica-01",
            "worker": "one-dm-worker-ID"
        },
        {
            "result": true,
            "msg": "",
            "source": "aurora-replica-02",
            "worker": "another-dm-worker-ID"
        }
    ]
}

If source db replication privilege checker and source db dump privilege checker errors are in the returned information, check whether unrecognized privileges exsit in the errorMsg field. For example:

line 1 column 287 near \"INVOKE LAMBDA ON *.* TO...

The returned information above shows that the INVOKE LAMBDA privilege causes an error. If the privilege is Aurora-specific, add the following content to the configuration file to skip the check. DM will improve the automatic handling of Aurora privileges in later versions.

ignore-checking-items: ["replication_privilege","dump_privilege"]

Step 6: Query the task and validate the data

Use dmctl through TiUP to query information of the on-going migration task and the task status.

tiup dmctl --master-addr 127.0.0.1:8261 query-status

If the task is running normally, the following information is returned.

{
    "result": true,
    "msg": "",
    "tasks": [
        {
            "taskName": "test",
            "taskStatus": "Running",
            "sources": [
                "aurora-replica-01",
                "aurora-replica-02"
            ]
        }
    ]
}

You can query data in the downstream, modify data in Aurora, and validate the data migrated to TiDB.

Download PDFRequest docs changes
Was this page helpful?
Open Source Ecosystem
TiDB
TiKV
TiSpark
Chaos Mesh
© 2022 PingCAP. All Rights Reserved.