TiFlash Data validation
This document introduces the data validation mechanism and tools for TiFlash.
Data corruptions are usually caused by serious hardware failures. In such cases, even if you attempt to manually recover data, your data become less reliable.
To ensure data integrity, by default, TiFlash performs basic data validation on data files, using the
City128 algorithm. In the event of any data validation failure, TiFlash immediately reports an error and exits, avoiding secondary disasters caused by inconsistent data. At this time, you need to manually intervene and replicate the data again before you can restore the TiFlash node.
Starting from v5.4.0, TiFlash introduces more advanced data validation features. TiFlash uses the
XXH3 algorithm by default and allows you to customize the validation frame and algorithm.
The validation mechanism builds upon the DeltaTree File (DTFile). DTFile is the storage file that persists TiFlash data. DTFile has three formats:
|V1||Deprecated||Hashes are embedded in data files.|
|V2||Default for versions < v6.0.0||Hashes are embedded in data files.||Compared to V1, V2 adds statistics of column data.|
|V3||Default for versions >= v6.0.0||V3 contains metadata and token data checksum, and supports multiple hash algorithms.||New in v5.4.0.|
DTFile is stored in the
stable folder in the data file directory. All formats currently enabled are in folder format, which means the data is stored in multiple files under a folder with a name like
Use data validation
TiFlash supports both automatic and manual data validation:
- Automatic data validation:
- v6.0.0 and later versions use the V3 validation mechanism by default.
- Versions earlier than v6.0.0 use the V2 validation mechanism by default.
- To manually switch the validation mechanism, refer to TiFlash configuration file. However, the default configuration is verified by tests and therefore recommended.
- Manual data validation. Refer to
In addition to automatic data validation performed when TiFlash reads data, a tool for manually checking data integrity is introduced in v5.4.0. For details, refer to DTTool.