Sink to MySQL
This document describes how to stream data from TiDB Cloud to MySQL using the Sink to MySQL changefeed.
Restrictions
- For each TiDB Cloud cluster, you can create up to 100 changefeeds.
- Because TiDB Cloud uses TiCDC to establish changefeeds, it has the same restrictions as TiCDC.
- If the table to be replicated does not have a primary key or a non-null unique index, the absence of a unique constraint during replication could result in duplicated data being inserted downstream in some retry scenarios.
Prerequisites
Before creating a changefeed, you need to complete the following prerequisites:
- Set up your network connection
- Export and load the existing data to MySQL (optional)
- Create corresponding target tables in MySQL if you do not load the existing data and only want to replicate incremental data to MySQL
Network
Make sure that your TiDB Cluster can connect to the MySQL service.
If your MySQL service is in an AWS VPC that has no public internet access, take the following steps:
Set up a VPC peering connection between the VPC of the MySQL service and your TiDB cluster.
Modify the inbound rules of the security group that the MySQL service is associated with.
You must add the CIDR of the region where your TiDB Cloud cluster is located to the inbound rules. Doing so allows the traffic to flow from your TiDB Cluster to the MySQL instance.
If the MySQL URL contains a hostname, you need to allow TiDB Cloud to be able to resolve the DNS hostname of the MySQL service.
- Follow the steps in Enable DNS resolution for a VPC peering connection.
- Enable the Accepter DNS resolution option.
If your MySQL service is in a Google Cloud VPC that has no public internet access, take the following steps:
If your MySQL service is Google Cloud SQL, you must expose a MySQL endpoint in the associated VPC of the Google Cloud SQL instance. You may need to use the Cloud SQL Auth proxy which is developed by Google.
Set up a VPC peering connection between the VPC of the MySQL service and your TiDB cluster.
Modify the ingress firewall rules of the VPC where MySQL is located.
You must add the CIDR of the region where your TiDB Cloud cluster is located to the ingress firewall rules. Doing so allows the traffic to flow from your TiDB Cluster to the MySQL endpoint.
Load existing data (optional)
The Sink to MySQL connector can only sink incremental data from your TiDB cluster to MySQL after a certain timestamp. If you already have data in your TiDB cluster, you can export and load the existing data of your TiDB cluster into MySQL before enabling Sink to MySQL.
To load the existing data:
Extend the tidb_gc_life_time to be longer than the total time of the following two operations, so that historical data during the time is not garbage collected by TiDB.
- The time to export and import the existing data
- The time to create Sink to MySQL
For example:
SET GLOBAL tidb_gc_life_time = '720h';Use Dumpling to export data from your TiDB cluster, then use community tools such as mydumper/myloader to load data to the MySQL service.
From the exported files of Dumpling, get the start position of MySQL sink from the metadata file:
The following is a part of an example metadata file. The
Pos
ofSHOW MASTER STATUS
is the TSO of the existing data, which is also the start position of MySQL sink.Started dump at: 2020-11-10 10:40:19 SHOW MASTER STATUS: Log: tidb-binlog Pos: 420747102018863124 Finished dump at: 2020-11-10 10:40:20
Create target tables in MySQL
If you do not load the existing data, you need to create corresponding target tables in MySQL manually to store the incremental data from TiDB. Otherwise, the data will not be replicated.
Create a MySQL sink
After completing the prerequisites, you can sink your data to MySQL.
Navigate to the cluster overview page of the target TiDB cluster, and then click Changefeed in the left navigation pane.
Click Create Changefeed, and select MySQL as Target Type.
Fill in the MySQL endpoints, user name, and password in MySQL Connection.
Click Next to test whether TiDB can connect to MySQL successfully:
- If yes, you are directed to the next step of configuration.
- If not, a connectivity error is displayed, and you need to handle the error. After the error is resolved, click Next again.
Customize Table Filter to filter the tables that you want to replicate. For the rule syntax, refer to table filter rules.
- Filter Rules: you can set filter rules in this column. By default, there is a rule
*.*
, which stands for replicating all tables. When you add a new rule, TiDB Cloud queries all the tables in TiDB and displays only the tables that match the rules in the box on the right. You can add up to 100 filter rules. - Tables with valid keys: this column displays the tables that have valid keys, including primary keys or unique indexes.
- Tables without valid keys: this column shows tables that lack primary keys or unique keys. These tables present a challenge during replication because the absence of a unique identifier can result in inconsistent data when the downstream handles duplicate events. To ensure data consistency, it is recommended to add unique keys or primary keys to these tables before initiating the replication. Alternatively, you can add filter rules to exclude these tables. For example, you can exclude the table
test.tbl1
by using the rule"!test.tbl1"
.
- Filter Rules: you can set filter rules in this column. By default, there is a rule
Customize Event Filter to filter the events that you want to replicate.
- Tables matching: you can set which tables the event filter will be applied to in this column. The rule syntax is the same as that used for the preceding Table Filter area. You can add up to 10 event filter rules per changefeed.
- Ignored events: you can set which types of events the event filter will exclude from the changefeed.
In Start Replication Position, configure the starting position for your MySQL sink.
- If you have loaded the existing data using Dumpling, select Start replication from a specific TSO and fill in the TSO that you get from Dumpling exported metadata files.
- If you do not have any data in the upstream TiDB cluster, select Start replication from now on.
- Otherwise, you can customize the start time point by choosing Start replication from a specific time.
Click Next to configure your changefeed specification.
- In the Changefeed Specification area, specify the number of Replication Capacity Units (RCUs) to be used by the changefeed.
- In the Changefeed Name area, specify a name for the changefeed.
Click Next to review the changefeed configuration.
If you confirm that all configurations are correct, check the compliance of cross-region replication, and click Create.
If you want to modify some configurations, click Previous to go back to the previous configuration page.
The sink starts soon, and you can see the status of the sink changes from Creating to Running.
Click the changefeed name, and you can see more details about the changefeed, such as the checkpoint, replication latency, and other metrics.
If you have loaded the existing data using Dumpling, you need to restore the GC time to its original value (the default value is
10m
) after the sink is created:
SET GLOBAL tidb_gc_life_time = '10m';