- About TiDB
- Quick Start
- Deploy
- Software and Hardware Requirements
- Environment Configuration Checklist
- Topology Patterns
- Install and Start
- Verify Cluster Status
- Migrate
- Maintain
- Upgrade
- Scale
- Backup and Restore
- Use BR Tool (Recommended)
- Read Historical Data
- Configure Time Zone
- Daily Checklist
- Maintain TiFlash
- Maintain TiDB Using TiUP
- Modify Configuration Online
- Monitor and Alert
- Troubleshoot
- TiDB Troubleshooting Map
- Identify Slow Queries
- Analyze Slow Queries
- SQL Diagnostics
- Identify Expensive Queries
- Statement Summary Tables
- Troubleshoot Hotspot Issues
- Troubleshoot Increased Read and Write Latency
- Troubleshoot Cluster Setup
- Troubleshoot High Disk I/O Usage
- Troubleshoot Lock Conflicts
- Troubleshoot TiFlash
- Troubleshoot Write Conflicts in Optimistic Transactions
- Performance Tuning
- System Tuning
- Software Tuning
- SQL Tuning
- Overview
- Understanding the Query Execution Plan
- SQL Optimization Process
- Overview
- Logic Optimization
- Physical Optimization
- Prepare Execution Plan Cache
- Control Execution Plans
- Tutorials
- TiDB Ecosystem Tools
- Reference
- Cluster Architecture
- Key Monitoring Metrics
- Secure
- Privileges
- SQL
- SQL Language Structure and Syntax
- SQL Statements
ADD COLUMN
ADD INDEX
ADMIN
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
ALTER DATABASE
ALTER INDEX
ALTER INSTANCE
ALTER TABLE
ALTER USER
ANALYZE TABLE
BACKUP
BEGIN
CHANGE COLUMN
COMMIT
CHANGE DRAINER
CHANGE PUMP
CREATE [GLOBAL|SESSION] BINDING
CREATE DATABASE
CREATE INDEX
CREATE ROLE
CREATE SEQUENCE
CREATE TABLE LIKE
CREATE TABLE
CREATE USER
CREATE VIEW
DEALLOCATE
DELETE
DESC
DESCRIBE
DO
DROP [GLOBAL|SESSION] BINDING
DROP COLUMN
DROP DATABASE
DROP INDEX
DROP ROLE
DROP SEQUENCE
DROP STATS
DROP TABLE
DROP USER
DROP VIEW
EXECUTE
EXPLAIN ANALYZE
EXPLAIN
FLASHBACK TABLE
FLUSH PRIVILEGES
FLUSH STATUS
FLUSH TABLES
GRANT <privileges>
GRANT <role>
INSERT
KILL [TIDB]
LOAD DATA
LOAD STATS
MODIFY COLUMN
PREPARE
RECOVER TABLE
RENAME INDEX
RENAME TABLE
REPLACE
RESTORE
REVOKE <privileges>
REVOKE <role>
ROLLBACK
SELECT
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET PASSWORD
SET ROLE
SET TRANSACTION
SET [GLOBAL|SESSION] <variable>
SHOW ANALYZE STATUS
SHOW [BACKUPS|RESTORES]
SHOW [GLOBAL|SESSION] BINDINGS
SHOW BUILTINS
SHOW CHARACTER SET
SHOW COLLATION
SHOW [FULL] COLUMNS FROM
SHOW CONFIG
SHOW CREATE SEQUENCE
SHOW CREATE TABLE
SHOW CREATE USER
SHOW DATABASES
SHOW DRAINER STATUS
SHOW ENGINES
SHOW ERRORS
SHOW [FULL] FIELDS FROM
SHOW GRANTS
SHOW INDEX [FROM|IN]
SHOW INDEXES [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW MASTER STATUS
SHOW PLUGINS
SHOW PRIVILEGES
SHOW [FULL] PROCESSSLIST
SHOW PROFILES
SHOW PUMP STATUS
SHOW SCHEMAS
SHOW STATS_HEALTHY
SHOW STATS_HISTOGRAMS
SHOW STATS_META
SHOW STATUS
SHOW TABLE NEXT_ROW_ID
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [FULL] TABLES
SHOW [GLOBAL|SESSION] VARIABLES
SHOW WARNINGS
SHUTDOWN
SPLIT REGION
START TRANSACTION
TRACE
TRUNCATE
UPDATE
USE
- Data Types
- Functions and Operators
- Overview
- Type Conversion in Expression Evaluation
- Operators
- Control Flow Functions
- String Functions
- Numeric Functions and Operators
- Date and Time Functions
- Bit Functions and Operators
- Cast Functions and Operators
- Encryption and Compression Functions
- Information Functions
- JSON Functions
- Aggregate (GROUP BY) Functions
- Window Functions
- Miscellaneous Functions
- Precision Math
- Set Operations
- List of Expressions for Pushdown
- Clustered Indexes
- Constraints
- Generated Columns
- SQL Mode
- Transactions
- Garbage Collection (GC)
- Views
- Partitioning
- Character Set and Collation
- System Tables
mysql
- INFORMATION_SCHEMA
- Overview
ANALYZE_STATUS
CHARACTER_SETS
CLUSTER_CONFIG
CLUSTER_HARDWARE
CLUSTER_INFO
CLUSTER_LOAD
CLUSTER_LOG
CLUSTER_SYSTEMINFO
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
COLUMNS
DDL_JOBS
ENGINES
INSPECTION_RESULT
INSPECTION_RULES
INSPECTION_SUMMARY
KEY_COLUMN_USAGE
METRICS_SUMMARY
METRICS_TABLES
PARTITIONS
PROCESSLIST
SCHEMATA
SEQUENCES
SESSION_VARIABLES
SLOW_QUERY
STATISTICS
TABLES
TABLE_CONSTRAINTS
TABLE_STORAGE_STATS
TIDB_HOT_REGIONS
TIDB_INDEXES
TIDB_SERVERS_INFO
TIFLASH_REPLICA
TIKV_REGION_PEERS
TIKV_REGION_STATUS
TIKV_STORE_STATUS
USER_PRIVILEGES
VIEWS
METRICS_SCHEMA
- UI
- TiDB Dashboard
- Overview
- Maintain
- Access
- Overview Page
- Cluster Info Page
- Key Visualizer Page
- Metrics Relation Graph
- SQL Statements Analysis
- Slow Queries Page
- Cluster Diagnostics
- Search Logs Page
- Profile Instances Page
- FAQ
- TiDB Dashboard
- CLI
- Command Line Flags
- Configuration File Parameters
- System Variables
- Storage Engines
- TiUP
- Telemetry
- Errors Codes
- Table Filter
- Schedule Replicas by Topology Labels
- FAQs
- Glossary
- Release Notes
- All Releases
- TiDB Roadmap
- v5.0
- v4.0
- v3.1
- v3.0
- v2.1
- v2.0
- v1.0
TiDB Lightning Backends
The backend determines how TiDB Lightning imports data into the target cluster.
TiDB Lightning supports the following backends:
The Importer-backend (default): tidb-lightning
first encodes the SQL or CSV data into KV pairs, and relies on the external tikv-importer
program to sort these KV pairs and ingest directly into the TiKV nodes.
The Local-backend: tidb-lightning
first encodes data into key-value pairs, sorts and stores them in a local temporary directory, and writes these key-value pairs to each TiKV node in batches. Then, TiKV ingests these key-value pairs into the cluster. The implementation of Local-backend is the same with that of Importer-backend but does not rely on the external tikv-importer
component.
The TiDB-backend: tidb-lightning
first encodes these data into SQL INSERT
statements, and has these statements executed directly on the TiDB node.
Backend | Local-backend | Importer-backend | TiDB-backend |
---|---|---|---|
Speed | Fast (~500 GB/hr) | Fast (~300 GB/hr) | Slow (~50 GB/hr) |
Resource usage | High | High | Low |
Network bandwidth usage | High | Medium | Low |
ACID respected while importing | No | No | Yes |
Target tables | Must be empty | Must be empty | Can be populated |
Additional component required | No | tikv-importer | No |
TiDB versions supported | >= v4.0.0 | All | All |
How to choose the backend modes
- If the target cluster of data import is v4.0 or later versions, consider using the Local-backend mode first, which is easier to use and has higher performance than that of the other two modes.
- If the target cluster of data import is v3.x or earlier versions, it is recommended to use the Importer-backend mode.
- If the target cluster of data import is in the online production environment, or if the target table of data import already has data on it, it is recommended to use the TiDB-backend mode.
TiDB Lightning Local-backend
The Local-backend feature is introduced to TiDB Lightning since TiDB v4.0.3. You can use this feature to import data to TiDB clusters of v4.0.0 or above.
Deployment for Local-backend
To deploy TiDB Lightning in the Local-backend mode, see TiDB Lightning Deployment.
TiDB Lightning TiDB-backend
Note:
Since TiDB v4.0, PingCAP no longer maintains the Loader tool. Since v5.0, the Loader documentation is no longer available. Loader's functionality has been completely replaced by the TiDB-backend of TiDB Lightning, so it is highly recommended to switch to TiDB Lightning.
Deployment for TiDB-backend
When using the TiDB-backend, deploying tikv-importer
is not necessary. Compared with the standard deployment procedure, the TiDB-backend deployment has the following two differences:
- All steps involving
tikv-importer
can be skipped. - The configuration must be changed to declare that the TiDB-backend is used.
Hardware requirements
The speed of TiDB Lightning using TiDB-backend is limited by the SQL processing speed of TiDB. Therefore, even a lower-end machine may max out the possible performance. The recommended hardware configuration is:
- 16 logical cores CPU
- An SSD large enough to store the entire data source, preferring higher read speed
- 1 Gigabit network card
Manual deployment
You do not need to download and configure tikv-importer
. You can download TiDB Lightning from here.
Before running tidb-lightning
, add the following lines into the configuration file:
[tikv-importer]
backend = "tidb"
or supplying the --backend tidb
arguments when executing tidb-lightning
.
Conflict resolution
The TiDB-backend supports importing to an already-populated table. However, the new data might cause a unique key conflict with the old data. You can control how to resolve the conflict by using this task configuration.
[tikv-importer]
backend = "tidb"
on-duplicate = "replace" # or "error" or "ignore"
Setting | Behavior on conflict | Equivalent SQL statement |
---|---|---|
replace | New entries replace old ones | REPLACE INTO ... |
ignore | Keep old entries and ignore new ones | INSERT IGNORE INTO ... |
error | Abort import | INSERT INTO ... |
Migrating from Loader to TiDB Lightning TiDB-backend
If you need to import data into a TiDB cluster, TiDB Lightning using the TiDB-backend can completely replace the functionalities of Loader. The following list shows how to translate Loader configurations into TiDB Lightning configurations.
Loader | TiDB Lightning |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
TiDB Lightning Importer-backend
Deployment for Importer-backend mode
This section describes how to deploy TiDB Lightning manually in the Importer-backend mode:
Hardware requirements
tidb-lightning
and tikv-importer
are both resource-intensive programs. It is recommended to deploy them into two separate machines.
To achieve the best performance, it is recommended to use the following hardware configuration:
tidb-lightning
:- 32+ logical cores CPU
- An SSD large enough to store the entire data source, preferring higher read speed
- 10 Gigabit network card (capable of transferring at ≥300 MB/s)
tidb-lightning
fully consumes all CPU cores when running, and deploying on a dedicated machine is highly recommended. If not possible,tidb-lightning
could be deployed together with other components liketidb-server
, and the CPU usage could be limited via theregion-concurrency
setting.
tikv-importer
:- 32+ logical cores CPU
- 40 GB+ memory
- 1 TB+ SSD, preferring higher IOPS (≥ 8000 is recommended)
- The disk should be larger than the total size of the top N tables, where
N
=max(index-concurrency, table-concurrency)
.
- The disk should be larger than the total size of the top N tables, where
- 10 Gigabit network card (capable of transferring at ≥300 MB/s)
tikv-importer
fully consumes all CPU, disk I/O and network bandwidth when running, and deploying on a dedicated machine is strongly recommended.
If you have sufficient machines, you can deploy multiple tidb lightning
+ tikv importer
servers, with each working on a distinct set of tables, to import the data in parallel.
Deploy TiDB Lightning manually
Step 1: Deploy a TiDB cluster
Before importing data, you need to have a deployed TiDB cluster, with the cluster version 2.0.9 or above. It is highly recommended to use the latest version.
You can find deployment instructions in TiDB Quick Start Guide.
Step 2: Download the TiDB Lightning installation package
Refer to the TiDB enterprise tools download page to download the TiDB Lightning package (choose the same version as that of the TiDB cluster).
Step 3: Start tikv-importer
Upload
bin/tikv-importer
from the installation package.Configure
tikv-importer.toml
.# TiKV Importer configuration file template # Log file log-file = "tikv-importer.log" # Log level: trace, debug, info, warn, error, off. log-level = "info" # Listening address of the status server. status-server-address = "0.0.0.0:8286" [server] # The listening address of tikv-importer. tidb-lightning needs to connect to # this address to write data. addr = "0.0.0.0:8287" [import] # The directory to store engine files. import-dir = "/mnt/ssd/data.import/"
The above only shows the essential settings. See the Configuration section for the full list of settings.
Run
tikv-importer
.nohup ./tikv-importer -C tikv-importer.toml > nohup.out &
Step 4: Start tidb-lightning
Upload
bin/tidb-lightning
andbin/tidb-lightning-ctl
from the tool set.Mount the data source onto the same machine.
Configure
tidb-lightning.toml
. For configurations that do not appear in the template below, TiDB Lightning writes a configuration error to the log file and exits.[lightning] # The concurrency number of data. It is set to the number of logical CPU # cores by default. When deploying together with other components, you can # set it to 75% of the size of logical CPU cores to limit the CPU usage. # region-concurrency = # Logging level = "info" file = "tidb-lightning.log" [tikv-importer] # The listening address of tikv-importer. Change it to the actual address. addr = "172.16.31.10:8287" [mydumper] # mydumper local source data directory data-source-dir = "/data/my_database" [tidb] # Configuration of any TiDB server from the cluster host = "172.16.31.1" port = 4000 user = "root" password = "" # Table schema information is fetched from TiDB via this status-port. status-port = 10080
The above only shows the essential settings. See the Configuration section for the full list of settings.
Run
tidb-lightning
. If you directly run the command in the command-line, the process might exit because of the SIGHUP signal received. Instead, it's preferable to run a bash script that contains thenohup
command:nohup ./tidb-lightning -config tidb-lightning.toml > nohup.out &