- About TiDB
- Quick Start
- Deploy
- Software and Hardware Requirements
- Environment Configuration Checklist
- Topology Patterns
- Install and Start
- Verify Cluster Status
- Migrate
- Maintain
- Upgrade
- Scale
- Backup and Restore
- Use BR Tool (Recommended)
- Read Historical Data
- Configure Time Zone
- Daily Checklist
- Maintain TiFlash
- Maintain TiDB Using TiUP
- Modify Configuration Online
- Monitor and Alert
- Troubleshoot
- TiDB Troubleshooting Map
- Identify Slow Queries
- Analyze Slow Queries
- SQL Diagnostics
- Identify Expensive Queries
- Statement Summary Tables
- Troubleshoot Hotspot Issues
- Troubleshoot Increased Read and Write Latency
- Troubleshoot Cluster Setup
- Troubleshoot High Disk I/O Usage
- Troubleshoot Lock Conflicts
- Troubleshoot TiFlash
- Troubleshoot Write Conflicts in Optimistic Transactions
- Performance Tuning
- System Tuning
- Software Tuning
- SQL Tuning
- Overview
- Understanding the Query Execution Plan
- SQL Optimization Process
- Overview
- Logic Optimization
- Physical Optimization
- Prepare Execution Plan Cache
- Control Execution Plans
- Tutorials
- TiDB Ecosystem Tools
- Reference
- Cluster Architecture
- Key Monitoring Metrics
- Secure
- Privileges
- SQL
- SQL Language Structure and Syntax
- SQL Statements
ADD COLUMN
ADD INDEX
ADMIN
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
ALTER DATABASE
ALTER INDEX
ALTER INSTANCE
ALTER TABLE
ALTER USER
ANALYZE TABLE
BACKUP
BEGIN
CHANGE COLUMN
COMMIT
CHANGE DRAINER
CHANGE PUMP
CREATE [GLOBAL|SESSION] BINDING
CREATE DATABASE
CREATE INDEX
CREATE ROLE
CREATE SEQUENCE
CREATE TABLE LIKE
CREATE TABLE
CREATE USER
CREATE VIEW
DEALLOCATE
DELETE
DESC
DESCRIBE
DO
DROP [GLOBAL|SESSION] BINDING
DROP COLUMN
DROP DATABASE
DROP INDEX
DROP ROLE
DROP SEQUENCE
DROP STATS
DROP TABLE
DROP USER
DROP VIEW
EXECUTE
EXPLAIN ANALYZE
EXPLAIN
FLASHBACK TABLE
FLUSH PRIVILEGES
FLUSH STATUS
FLUSH TABLES
GRANT <privileges>
GRANT <role>
INSERT
KILL [TIDB]
LOAD DATA
LOAD STATS
MODIFY COLUMN
PREPARE
RECOVER TABLE
RENAME INDEX
RENAME TABLE
REPLACE
RESTORE
REVOKE <privileges>
REVOKE <role>
ROLLBACK
SELECT
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET PASSWORD
SET ROLE
SET TRANSACTION
SET [GLOBAL|SESSION] <variable>
SHOW ANALYZE STATUS
SHOW [BACKUPS|RESTORES]
SHOW [GLOBAL|SESSION] BINDINGS
SHOW BUILTINS
SHOW CHARACTER SET
SHOW COLLATION
SHOW [FULL] COLUMNS FROM
SHOW CONFIG
SHOW CREATE SEQUENCE
SHOW CREATE TABLE
SHOW CREATE USER
SHOW DATABASES
SHOW DRAINER STATUS
SHOW ENGINES
SHOW ERRORS
SHOW [FULL] FIELDS FROM
SHOW GRANTS
SHOW INDEX [FROM|IN]
SHOW INDEXES [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW MASTER STATUS
SHOW PLUGINS
SHOW PRIVILEGES
SHOW [FULL] PROCESSSLIST
SHOW PROFILES
SHOW PUMP STATUS
SHOW SCHEMAS
SHOW STATS_HEALTHY
SHOW STATS_HISTOGRAMS
SHOW STATS_META
SHOW STATUS
SHOW TABLE NEXT_ROW_ID
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [FULL] TABLES
SHOW [GLOBAL|SESSION] VARIABLES
SHOW WARNINGS
SHUTDOWN
SPLIT REGION
START TRANSACTION
TRACE
TRUNCATE
UPDATE
USE
- Data Types
- Functions and Operators
- Overview
- Type Conversion in Expression Evaluation
- Operators
- Control Flow Functions
- String Functions
- Numeric Functions and Operators
- Date and Time Functions
- Bit Functions and Operators
- Cast Functions and Operators
- Encryption and Compression Functions
- Information Functions
- JSON Functions
- Aggregate (GROUP BY) Functions
- Window Functions
- Miscellaneous Functions
- Precision Math
- Set Operations
- List of Expressions for Pushdown
- Clustered Indexes
- Constraints
- Generated Columns
- SQL Mode
- Transactions
- Garbage Collection (GC)
- Views
- Partitioning
- Character Set and Collation
- System Tables
mysql
- INFORMATION_SCHEMA
- Overview
ANALYZE_STATUS
CHARACTER_SETS
CLUSTER_CONFIG
CLUSTER_HARDWARE
CLUSTER_INFO
CLUSTER_LOAD
CLUSTER_LOG
CLUSTER_SYSTEMINFO
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
COLUMNS
DDL_JOBS
ENGINES
INSPECTION_RESULT
INSPECTION_RULES
INSPECTION_SUMMARY
KEY_COLUMN_USAGE
METRICS_SUMMARY
METRICS_TABLES
PARTITIONS
PROCESSLIST
SCHEMATA
SEQUENCES
SESSION_VARIABLES
SLOW_QUERY
STATISTICS
TABLES
TABLE_CONSTRAINTS
TABLE_STORAGE_STATS
TIDB_HOT_REGIONS
TIDB_INDEXES
TIDB_SERVERS_INFO
TIFLASH_REPLICA
TIKV_REGION_PEERS
TIKV_REGION_STATUS
TIKV_STORE_STATUS
USER_PRIVILEGES
VIEWS
METRICS_SCHEMA
- UI
- TiDB Dashboard
- Overview
- Maintain
- Access
- Overview Page
- Cluster Info Page
- Key Visualizer Page
- Metrics Relation Graph
- SQL Statements Analysis
- Slow Queries Page
- Cluster Diagnostics
- Search Logs Page
- Profile Instances Page
- FAQ
- TiDB Dashboard
- CLI
- Command Line Flags
- Configuration File Parameters
- System Variables
- Storage Engines
- TiUP
- Telemetry
- Errors Codes
- Table Filter
- Schedule Replicas by Topology Labels
- FAQs
- Glossary
- Release Notes
- All Releases
- TiDB Roadmap
- v5.0
- v4.0
- v3.1
- v3.0
- v2.1
- v2.0
- v1.0
TiCDC Overview
Note:
TiCDC is a feature for general availability (GA) since v4.0.6. You can use it in the production environment.
TiCDC is a tool for replicating the incremental data of TiDB. This tool is implemented by pulling TiKV change logs. It can restore data to a consistent state with any upstream TSO, and provides TiCDC Open Protocol to support other systems to subscribe to data changes.
TiCDC Architecture
When TiCDC is running, it is a stateless node that achieves high availability through etcd in PD. The TiCDC cluster supports creating multiple replication tasks to replicate data to multiple different downstream platforms.
The architecture of TiCDC is shown in the following figure:
System roles
TiKV CDC component: Only outputs key-value (KV) change logs.
- Assembles KV change logs in the internal logic.
- Provides the interface to output KV change logs. The data sent includes real-time change logs and incremental scan change logs.
capture
: The operating process of TiCDC. Multiplecapture
s form a TiCDC cluster that replicates KV change logs.- Each
capture
pulls a part of KV change logs. - Sorts the pulled KV change log(s).
- Restores the transaction to downstream or outputs the log based on the TiCDC open protocol.
- Each
Replication features
This section introduces the replication features of TiCDC.
Sink support
Currently, the TiCDC sink component supports replicating data to the following downstream platforms:
- Databases compatible with MySQL protocol. The sink component provides the final consistency support.
- Kafka based on the TiCDC Open Protocol. The sink component ensures the row-level order, final consistency or strict transactional consistency.
Ensure replication order and consistency
Replication order
For all DDL or DML statements, TiCDC outputs them at least once.
When the TiKV or TiCDC cluster encounters failure, TiCDC might send the same DDL/DML statement repeatedly. For duplicated DDL/DML statements:
- MySQL sink can execute DDL statements repeatedly. For DDL statements that can be executed repeatedly in the downstream, such as
truncate table
, the statement is executed successfully. For those that cannot be executed repeatedly, such ascreate table
, the execution fails, and TiCDC ignores the error and continues the replication. - Kafka sink sends messages repeatedly, but the duplicate messages do not affect the constraints of
Resolved Ts
. Users can filter the duplicated messages from Kafka consumers.
- MySQL sink can execute DDL statements repeatedly. For DDL statements that can be executed repeatedly in the downstream, such as
Replication consistency
MySQL sink
- TiCDC does not split single-table transactions and ensures the atomicity of single-table transactions.
- TiCDC does not ensure that the execution order of downstream transactions is the same as that of upstream transactions.
- TiCDC splits cross-table transactions in the unit of table and does not ensure the atomicity of cross-table transactions.
- TiCDC ensures that the order of single-row updates is consistent with that in the upstream.
Kafka sink
- TiCDC provides different strategies for data distribution. You can distribute data to different Kafka partitions based on the table, primary key, or timestamp.
- For different distribution strategies, the different consumer implementations can achieve different levels of consistency, including row-level consistency, eventual consistency, or cross-table transactional consistency.
- TiCDC does not have an implementation of Kafka consumers, but only provides TiCDC Open Protocol. You can implement the Kafka consumer according to this protocol.
Restrictions
TiCDC only replicates the table that has at least one valid index. A valid index is defined as follows:
- The primary key (
PRIMARY KEY
) is a valid index. - The unique index (
UNIQUE INDEX
) that meets the following conditions at the same time is a valid index:- Every column of the index is explicitly defined as non-nullable (
NOT NULL
). - The index does not have the virtual generated column (
VIRTUAL GENERATED COLUMNS
).
- Every column of the index is explicitly defined as non-nullable (
Since v4.0.8, TiCDC supports replicating tables without a valid index by modifying the task configuration. However, this compromises the guarantee of data consistency to some extent. For more details, see Replicate tables without a valid index.
Unsupported scenarios
Currently, The following scenarios are not supported:
- The TiKV cluster that uses RawKV alone.
- The DDL operation
CREATE SEQUENCE
and the SEQUENCE function in TiDB v4.0. When the upstream TiDB usesSEQUENCE
, TiCDC ignoresSEQUENCE
DDL operations/functions performed upstream. However, DML operations usingSEQUENCE
functions can be correctly replicated. - The TiKV Hibernate Region. TiCDC prevents the Region from entering the hibernated state.
TiCDC only provides partial support for scenarios of large transactions in the upstream. For details, refer to FAQ: Does TiCDC support replicating large transactions? Is there any risk?.
Notice for compatibility issues
Incompatibility issue caused by using the TiCDC v5.0.0-rc cdc cli
tool to operate a v4.0.x cluster
When using the cdc cli
tool of TiCDC v5.0.0-rc to operate a v4.0.x TiCDC cluster, you might encounter the following abnormal situations:
If the TiCDC cluster is v4.0.8 or an earlier version, using the v5.0.0-rc
cdc cli
tool to create a replication task might cause cluster anomalies and get the replication task stuck.If the TiCDC cluster is v4.0.9 or a later version, using the v5.0.0-rc
cdc cli
tool to create a replication task will cause the old value and unified sorter features to be unexpectedly enabled by default.
Solutions: Use the cdc
executable file corresponding to the TiCDC cluster version to perform the following operations:
- Delete the changefeed created using the v5.0.0-rc
cdc cli
tool. For example, run thetiup cdc:v4.0.9 cli changefeed remove -c xxxx --pd=xxxxx --force
command. - If the replication task is stuck, restart the TiCDC cluster. For example, run the
tiup cluster restart <cluster_name> -R cdc
command. - Re-create the changefeed. For example, run the
tiup cdc:v4.0.9 cli changefeed create --sink-uri=xxxx --pd=xxx
command.
Note:
The above issue exists only when
cdc cli
is v5.0.0-rc. Other v5.0.xcdc cli
tool can be compatible with v4.0.x clusters.
Install and deploy TiCDC
You can either deploy TiCDC along with a new TiDB cluster or add the TiCDC component to an existing TiDB cluster. For details, see Deploy TiCDC.
Manage TiCDC Cluster and Replication Tasks
Currently, you can use the cdc cli
tool to manage the status of a TiCDC cluster and data replication tasks. For details, see:
- Use
cdc cli
to manage cluster status and data replication task - Use HTTP interface to manage cluster status and data replication task
Troubleshoot TiCDC
For details, refer to Troubleshoot TiCDC.
TiCDC Open Protocol
TiCDC Open Protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and primary-secondary replication between different databases. TiCDC complies with TiCDC Open Protocol and replicates data changes of TiDB to third-party data medium such as MQ (Message Queue). For more information, see TiCDC Open Protocol.