Batch Processing

Batch processing is a common and essential operation in real-world scenarios. It enables efficient handling of large datasets for tasks such as data migration, bulk imports, archiving, and large-scale updates.

To optimize performance for batch operations, TiDB introduces various features over its version evolution:

Data import
- IMPORT INTO statement (introduced in TiDB v7.2.0 and GA in v7.5.0)
Data inserts, updates, and deletions
- Pipelined DML (experimental, introduced in TiDB v8.0.0)
- Non-transactional DML (introduced in TiDB v6.1.0)
- Batch-dml (deprecated)

This document outlines the key benefits, limitations, and use cases of these features to help you choose the most suitable solution for efficient batch processing.

Data import

The IMPORT INTO statement is designed for data import tasks. It enables you to quickly import data in formats such as CSV, SQL, or PARQUET into an empty TiDB table, without the need to deploy TiDB Lightning separately.

Key benefits

Extremely fast import speed
Easier to use compared to TiDB Lightning

Limitations

No transactional ACID guarantees
Subject to various usage restrictions

Use cases

Suitable for data import scenarios such as data migration or recovery. It is recommended to use IMPORT INTO instead of TiDB Lightning where applicable.

For more information, see IMPORT INTO.

Data inserts, updates, and deletions

Pipelined DML

Pipelined DML is an experimental feature introduced in TiDB v8.0.0. In v8.5.0, the feature is enhanced with significant performance improvements.

Key benefits

Streams data to the storage layer during transaction execution instead of buffering it entirely in memory, allowing transaction size no longer limited by TiDB memory and supporting ultra-large-scale data processing
Achieves better performance compared to standard DML
Can be enabled through system variables without SQL modifications

Limitations

Only supports autocommit INSERT, REPLACE, UPDATE, and DELETE statements.

Use cases

Suitable for general batch processing tasks, such as bulk data inserts, updates, and deletions.

For more information, see Pipelined DML.

Non-transactional DML statements

Non-transactional DML is introduced in TiDB v6.1.0. Initially, only the DELETE statement supports this feature. Starting from v6.5.0, INSERT, REPLACE, and UPDATE statements also support this feature.

Key benefits

Splits a single SQL statement into multiple smaller statements, bypassing memory limitations.
Achieves performance that is slightly faster or comparable to standard DML.

Limitations

Only supports autocommit statements
Requires modifications to SQL statements
Imposes strict requirements on SQL syntax; some statements might need rewriting
Lacks full transactional ACID guarantees; in case of failures, partial execution of a statement might occur

Use cases

Suitable for scenarios involving bulk data inserts, updates, and deletions. Due to its limitations, it is recommended to consider non-transactional DML only when Pipelined DML is not applicable.

For more information, see Non-transactional DML.

Deprecated batch-dml feature

The batch-dml feature, available in TiDB versions prior to v4.0, is now deprecated and no longer recommended. This feature is controlled by the following system variables:

tidb_batch_insert
tidb_batch_delete
tidb_batch_commit
tidb_enable_batch_dml
tidb_dml_batch_size

Due to the risk of data corruption or loss caused by inconsistent data and indexes, these variables have been deprecated and are planned for removal in future releases.

It is NOT RECOMMENDED to use the deprecated batch-dml feature under any circumstances. Instead, consider other alternative features outlined in this document.

Batch Processing

Data import

Key benefits

Limitations

Use cases

Data inserts, updates, and deletions

Pipelined DML

Key benefits

Limitations

Use cases

Non-transactional DML statements

Key benefits

Limitations

Use cases

Deprecated batch-dml feature

Was this page helpful?