Migrate Data to a Downstream TiDB Table with More Columns

This document provides the additional steps to be taken when you migrate data to a downstream TiDB table with more columns than the corresponding upstream table. For regular migration steps, see the following migration scenarios:

Use DM to migrate data to a downstream TiDB table with more columns

When replicating the upstream binlog, DM tries to use the current table schema of the downstream to parse the binlog and generate the corresponding DML statements. If the column number of the table in the upstream binlog does not match the column number in the downstream table schema, the following error occurs:

"errors": [
    {
        "ErrCode": 36027,
        "ErrClass": "sync-unit",
        "ErrScope": "internal",
        "ErrLevel": "high",
        "Message": "startLocation: [position: (mysql-bin.000001, 2022), gtid-set:09bec856-ba95-11ea-850a-58f2b4af5188:1-9 ], endLocation: [ position: (mysql-bin.000001, 2022), gtid-set: 09bec856-ba95-11ea-850a-58f2b4af5188:1-9]: gen insert sqls failed, schema: log, table: messages: Column count doesn't match value count: 3 (columns) vs 2 (values)",
        "RawCause": "",
        "Workaround": ""
    }
]

The following is an example upstream table schema:

# Upstream table schema
CREATE TABLE `messages` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
)

The following is an example downstream table schema:

# Downstream table schema
CREATE TABLE `messages` (
  `id` int(11) NOT NULL,
  `message` varchar(255) DEFAULT NULL, # This is the additional column that only exists in the downstream table.
  PRIMARY KEY (`id`)
)

When DM tries to use the downstream table schema to parse the binlog event generated by the upstream, DM reports the above Column count doesn't match error.

In such cases, you can use the operate-schema command to set a table schema for the table to be migrated from the data source. The specified table schema needs to correspond to the binlog event data to be replicated by DM. If you are migrating sharded tables, for each sharded table, you need to set a table schema in DM to parse binlog event data. The steps are as follows:

  1. Create a SQL file in DM and add the CREATE TABLE statement that corresponds to the upstream table schema to the file. For example, save the following table schema to log.messages.sql.

    # Upstream table schema
    CREATE TABLE `messages` (
    `id` int(11) NOT NULL,
    PRIMARY KEY (`id`)
    )
  2. Use the operate-schema command to set the table schema for the table to be migrated from the data source (At this time, the data migration task should be in the Paused state due to the above Column count doesn't match error).

    tiup dmctl --master-addr ${advertise-addr} operate-schema set -s ${source-id} ${task-name} -d ${database-name} -t ${table-name} ${schema-file}

    The descriptions of parameters in this command are as follows:

    ParameterDescription
    -master-addrSpecifies the ${advertise-addr} of any DM-master node in the cluster where dmctl is to be connected. ${advertise-addr} indicates the address that DM-master advertises to the outside world.
    operate-schema setManually set the schema information.
    -sSpecifies the source. ${source-id} indicates the source ID of MySQL data. ${task-name} indicates the name of the synchronization task defined in the task.yaml configuration file of the data migration task.
    -dSpecifies the database. ${database-name} indicates the name of the upstream database.
    -tSpecifies table. ${table-name} indicates the name of the upstream data table. ${schema-file} indicates the table schema file to be set.

    For example:

    tiup dmctl --master-addr 172.16.10.71:8261 operate-schema set -s mysql-01 task-test -d log -t message log.message.sql
  3. Use the resume-task command to resume the migration task in the Paused state.

    tiup dmctl --master-addr ${advertise-addr} resume-task ${task-name}
  4. Use the query-status command to confirm that the data migration task is running correctly.

    tiup dmctl --master-addr ${advertise-addr} query-status resume-task ${task-name}