📣
TiDB Cloud Premium is now in public preview. Unlimited growth, instant elasticity, advanced security for enterprise workloads. Try it out →

VACUUM TABLE



The VACUUM TABLE command helps optimize system performance by freeing up storage space through the permanent removal of historical data files from a table. This includes:

  • Snapshots associated with the table, as well as their relevant segments and blocks.

  • Orphan files. Orphan files in TiDB Cloud Lake refer to snapshots, segments, and blocks that are no longer associated with the table. Orphan files might be generated from various operations and errors, such as during data backups and restores, and can take up valuable disk space and degrade the system performance over time.

See also: VACUUM DROP TABLE

Syntax and Examples

VACUUM TABLE <table_name> [ DRY RUN [SUMMARY] ]
  • DRY RUN [SUMMARY]: When this parameter is specified, candidate orphan files will not be removed. Instead, a list of up to 1,000 candidate files and their sizes (in bytes) will be returned, showing what would have been removed if the option was not used. When the optional parameter SUMMARY is included, the command returns the total number of files to be removed and their combined size in bytes.

Output

The VACUUM TABLE command (without DRY RUN) returns a table summarizing vital statistics of the vacuumed files, containing the following columns:

ColumnDescription
snapshot_filesNumber of snapshot files
snapshot_sizeTotal size of snapshot files in bytes
segments_filesNumber of segment files
segments_sizeTotal size of segment files in bytes
block_filesNumber of block files
block_sizeTotal size of block files in bytes
index_filesNumber of index files
index_sizeTotal size of index files in bytes
total_filesTotal number of all types of files
total_sizeTotal size of all types of files in bytes
// highlight-next-line VACUUM TABLE c; ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ snapshot_files │ snapshot_size │ segments_files │ segments_size │ block_files │ block_size │ index_files │ index_size │ total_files │ total_size │ ├────────────────┼───────────────┼────────────────┼───────────────┼─────────────┼────────────┼─────────────┼────────────┼─────────────┼────────────┤ │ 319549480291890930603011706 │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

When the DRY RUN parameter is specified with the VACUUM TABLE command, it returns a list of up to 1,000 candidate files and their sizes in bytes. If DRY RUN SUMMARY is specified, the command provides the total number of files to be removed and their combined size.

// highlight-next-line VACUUM TABLE c DRY RUN; ┌──────────────────────────────────────────────────────────────┐ │ file │ file_size │ ├──────────────────────────────────────────────────┼───────────┤ │ 1/67/_ss/61aaf678b9af41568b539099b4b09908_v4.mpk │ 543 │ │ 1/67/_ss/dd149d21151c459d8c87076f9412c12c_v4.mpk │ 516 │ │ 1/67/_ss/7ba0b2e2f63c4d42897a48830027dcf3_v4.mpk │ 462 │ │ 1/67/_ss/db55dac72b29452db976cf0af0f8d962_v4.mpk │ 588 │ │ 1/67/_ss/d8055967298f478d97cddaa66cf67e11_v4.mpk │ 563 │ │ 1/67/_ss/00c4288dac014760808006f821f1ecbe_v4.mpk │ 609 │ └──────────────────────────────────────────────────────────────┘ // highlight-next-line VACUUM TABLE c DRY RUN SUMMARY; ┌──────────────────────────┐ │ total_files │ total_size │ ├─────────────┼────────────┤ │ 63281 │ └──────────────────────────┘

Adjusting Data Retention Time

The VACUUM TABLE command removes data files older than the data_retention_time_in_days setting. This retention period can be adjusted as needed, for example, to 2 days:

SET GLOBAL data_retention_time_in_days = 2;

data_retention_time_in_days defaults to 1 day (24 hours), and the maximum value varies across TiDB Cloud Lake editions:

EditionDefault RetentionMax. Retention
TiDB Cloud Lake Community & Enterprise Editions1 day (24 hours)90 days
TiDB Cloud Lake (Personal)1 day (24 hours)1 day (24 hours)
TiDB Cloud Lake (Business)1 day (24 hours)90 days

To check the current value of data_retention_time_in_days:

SHOW SETTINGS LIKE 'data_retention_time_in_days';

VACUUM TABLE vs. OPTIMIZE TABLE

TiDB Cloud Lake provides two commands for removing historical data files from a table: VACUUM TABLE and OPTIMIZE TABLE (with the PURGE option). Although both commands are capable of permanently deleting data files, they differ in how they handle orphan files: OPTIMIZE TABLE is able to remove orphan snapshots, as well as the corresponding segments and blocks. However, there is a possibility of orphan segments and blocks existing without any associated snapshots. In such a scenario, only VACUUM TABLE can help clean them up.

Both VACUUM TABLE and OPTIMIZE TABLE allow you to specify a period to determine which historical data files to remove. However, OPTIMIZE TABLE requires you to obtain the snapshot ID or timestamp from a query beforehand, whereas VACUUM TABLE allows you to specify the number of hours to retain the data files directly. VACUUM TABLE provides enhanced control over your historical data files both before their removal with the DRY RUN option, which allows you to preview the data files to be removed before applying the command. This provides a safe removal experience and helps you avoid unintended data loss.

VACUUM TABLEOPTIMIZE TABLE
Associated snapshots (incl. segments and blocks)YesYes
Orphan snapshots (incl. segments and blocks)YesYes
Orphan segments and blocks onlyYesNo
DRY RUNYesNo

Was this page helpful?