Batch Processing

Batch processing is a common and essential operation in real-world scenarios. It enables efficient handling of large datasets for tasks such as data migration, bulk imports, archiving, and large-scale updates.

To optimize performance for batch operations, TiDB introduces various features over its version evolution:

  • Data import
    • IMPORT INTO statement (introduced in TiDB v7.2.0 and GA in v7.5.0)
  • Data inserts, updates, and deletions
    • Pipelined DML (experimental, introduced in TiDB v8.0.0)
    • Non-transactional DML (introduced in TiDB v6.1.0)
    • Batch-dml (deprecated)

This document outlines the key benefits, limitations, and use cases of these features to help you choose the most suitable solution for efficient batch processing.

Data import

The IMPORT INTO statement is designed for data import tasks. It enables you to quickly import data in formats such as CSV, SQL, or PARQUET into an empty TiDB table, without the need to deploy TiDB Lightning separately.

Key benefits

  • Extremely fast import speed
  • Easier to use compared to TiDB Lightning

Limitations

  • No transactional ACID guarantees
  • Subject to various usage restrictions

Use cases

  • Suitable for data import scenarios such as data migration or recovery. It is recommended to use IMPORT INTO instead of TiDB Lightning where applicable.

For more information, see IMPORT INTO.

Data inserts, updates, and deletions

Pipelined DML

Pipelined DML is an experimental feature introduced in TiDB v8.0.0. In v8.5.0, the feature is enhanced with significant performance improvements.

Key benefits

  • Streams data to the storage layer during transaction execution instead of buffering it entirely in memory, allowing transaction size no longer limited by TiDB memory and supporting ultra-large-scale data processing
  • Achieves better performance compared to standard DML
  • Can be enabled through system variables without SQL modifications

Limitations

  • Only supports autocommit INSERT, REPLACE, UPDATE, and DELETE statements.

Use cases

  • Suitable for general batch processing tasks, such as bulk data inserts, updates, and deletions.

For more information, see Pipelined DML.

Non-transactional DML statements

Non-transactional DML is introduced in TiDB v6.1.0. Initially, only the DELETE statement supports this feature. Starting from v6.5.0, INSERT, REPLACE, and UPDATE statements also support this feature.

Key benefits

  • Splits a single SQL statement into multiple smaller statements, bypassing memory limitations.
  • Achieves performance that is slightly faster or comparable to standard DML.

Limitations

  • Only supports autocommit statements
  • Requires modifications to SQL statements
  • Imposes strict requirements on SQL syntax; some statements might need rewriting
  • Lacks full transactional ACID guarantees; in case of failures, partial execution of a statement might occur

Use cases

  • Suitable for scenarios involving bulk data inserts, updates, and deletions. Due to its limitations, it is recommended to consider non-transactional DML only when Pipelined DML is not applicable.

For more information, see Non-transactional DML.

Deprecated batch-dml feature

The batch-dml feature, available in TiDB versions prior to v4.0, is now deprecated and no longer recommended. This feature is controlled by the following system variables:

  • tidb_batch_insert
  • tidb_batch_delete
  • tidb_batch_commit
  • tidb_enable_batch_dml
  • tidb_dml_batch_size

Due to the risk of data corruption or loss caused by inconsistent data and indexes, these variables have been deprecated and are planned for removal in future releases.

It is NOT RECOMMENDED to use the deprecated batch-dml feature under any circumstances. Instead, consider other alternative features outlined in this document.

Was this page helpful?