Data Migration Shard Merge Scenario
This document shows how to use Data Migration (DM) to migrate data to the downstream TiDB in the shard merge scenario.
The example used in this document is a simple scenario where sharded schemas and sharded tables of two data source MySQL instances need to be migrated to a downstream TiDB cluster.
For other scenarios, you can refer to Best Practices of Data Migration in the Shard Merge Scenario.
Data source instances
Assume that the data source structures are as follows:
Instance 1
Schema Tables user information, log_bak store_01 sale_01, sale_02 store_02 sale_01, sale_02 Instance 2
Schema Tables user information, log_bak store_01 sale_01, sale_02 store_02 sale_01, sale_02
Migration requirements
- Merge the
user
.information
tables to the downstreamuser
.information
table in TiDB. - Merge the
store_{01|02}
.sale_{01|02}
tables in the above instances to the downstreamstore
.sale
table in TiDB. - Replicate
user
andstore_{01|02}
schemas but do not replicate theuser
.log_bak
tables in the above instances. - Filter out all the delete operations in the
store_{01|02}
.sale_{01|02}
table of the above instances and filter out thedrop database
operation in shemas.
The expected downstream schema after migration is as follows:
Schema | Tables |
---|---|
user | information |
store | sale |
Conflict check across sharded tables
Because migration requirements #1 and #2 involve the DM Shard Merge feature, data from multiple tables might cause conflicts between the primary keys or the unique keys. You need to check these sharded tables. For details, refer to Handle conflicts between primary keys or unique indexes across multiple sharded tables. In this example:
The table schema of user
.information
is
CREATE TABLE `information` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`uid` bigint(20) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`data` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uid` (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
In the above structure, column id
is the primary key and column uid
is the unique index. Column id
has auto-increment attribute and if the ranges of tables overlap, data conflicts might occur. Column uid
can ensure only a unique index exists globally. So, you can avoid column id
by following the steps in the section Remove the PRIMARY KEY
attribute from the column.
The table schema of store_{01|02}
.sale_{01|02}
is
CREATE TABLE `sale_01` (
`sid` bigint(20) NOT NULL,
`pid` bigint(20) NOT NULL,
`comment` varchar(255) DEFAULT NULL,
PRIMARY KEY (`sid`),
KEY `pid` (`pid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
In the above structure, sid
is the shard key, which can ensure that the same sid
only exists in one sharded table. So no data conflict is caused and you do not need to perform extra operations.
Migration solution
To satisfy the migration requirements #1, you do not need to configure the table routing rule. You need to manually create a table based on the requirements in the section Remove the
PRIMARY KEY
attribute from the column:CREATE TABLE `information` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `uid` bigint(20) DEFAULT NULL, `name` varchar(255) DEFAULT NULL, `data` varchar(255) DEFAULT NULL, INDEX (`id`), UNIQUE KEY `uid` (`uid`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1And skip precheck in the configuration file:
ignore-checking-items: ["auto_increment_ID"]To satisfy the migration requirement #2, configure the table routing rule as follows:
routes: ... store-route-rule: schema-pattern: "store_*" target-schema: "store" sale-route-rule: schema-pattern: "store_*" table-pattern: "sale_*" target-schema: "store" target-table: "sale"To satisfy the migration requirements #3, configure the Block and allow table lists as follows:
block-allow-list: log-bak-ignored: do-dbs: ["user", "store_*"] ignore-tables: - db-name: "user" tbl-name: "log_bak"To satisfy the migration requirement #4, configure the binlog event filter rule as follows:
filters: ... sale-filter-rule: # filter out all deletion operations of all tables under store_* schema schema-pattern: "store_*" table-pattern: "sale_*" events: ["truncate table", "drop table", "delete"] action: Ignore store-filter-rule: # filter out the deletion operation of store_* schema schema-pattern: "store_*" events: ["drop database"] action: Ignore
Migration task configuration
The complete configuration of the migration task is shown as follows. For more details, see Data Migration Task Configuration Guide.
name: "shard_merge"
task-mode: all # full data migration + incremental data migration
meta-schema: "dm_meta"
ignore-checking-items: ["auto_increment_ID"]
target-database:
host: "192.168.0.1"
port: 4000
user: "root"
password: ""
mysql-instances:
-
source-id: "instance-1" # The ID of the data source and can be obtained from the data source configuration
route-rules: ["store-route-rule", "sale-route-rule"] # Applies to the table route rules of this data source
filter-rules: ["store-filter-rule" , "sale-filter-rule"] # Applies to the binlog event filter rules of this data source
block-allow-list: "log-bak-ignored" # Applies to the block and allow lists of this data source
-
source-id: "instance-2"
route-rules: ["store-route-rule", "sale-route-rule"]
filter-rules: ["store-filter-rule", "sale-filter-rule"]
block-allow-list: "log-bak-ignored"
# Other common configs shared by all instances
routes:
store-route-rule:
schema-pattern: "store_*"
target-schema: "store"
sale-route-rule:
schema-pattern: "store_*"
table-pattern: "sale_*"
target-schema: "store"
target-table: "sale"
filters:
sale-filter-rule:
schema-pattern: "store_*"
table-pattern: "sale_*"
events: ["truncate table", "drop table", "delete"]
action: Ignore
store-filter-rule:
schema-pattern: "store_*"
events: ["drop database"]
action: Ignore
block-allow-list:
log-bak-ignored:
do-dbs: ["user", "store_*"]
ignore-tables:
- db-name: "user"
tbl-name: "log_bak"