TiDB 6.0.0 Release Notes

Release date: April 7, 2022

TiDB version: 6.0.0-DMR

Note

The TiDB 6.0.0-DMR documentation has been archived. PingCAP encourages you to use the latest LTS version of the TiDB database.

In 6.0.0-DMR, the key new features or improvements are as follows:

Support placement rules in SQL to provide more flexible management for data placement.
Add a consistency check between data and indexes at the kernel level, which improves system stability and robustness, with only very low resource overhead.
Provide Top SQL, a self-serving database performance monitoring and diagnosis feature for non-experts.
Support Continuous Profiling that collects cluster performance data all the time, reducing MTTR for technical experts.
Cache hotspot small tables in memory, which greatly improves the access performance, improves the throughput and reduces access latency.
Optimize in-memory pessimistic locking. Under the performance bottleneck caused by pessimistic locks, memory optimization for pessimistic locks can effectively reduce latency by 10% and increase QPS by 10%.
Enhance prepared statements to share execution plans, which lessens CPU resource consumption and improves SQL execution efficiency.
Improve the computing performance of the MPP engine by supporting pushing down more expressions and the general availability (GA) of the elastic thread pool.
Add DM WebUI to facilitate managing a large number of migration tasks.
Improve the stability and efficiency of TiCDC when replicating data in large clusters. TiCDC now supports replicating 100,000 tables simultaneously.
Accelerate leader balancing after restarting TiKV nodes, which improves the speed of business recovery after a restart.
Support canceling the automatic update of statistics, which reduces resource contention and limits the impact on SQL performance.
Provide PingCAP Clinic, an automatic diagnosis service for TiDB clusters (Technical Preview version).
Provide TiDB Enterprise Manager, an enterprise-level database management platform.

Also, as a core component of TiDB's HTAP solution, TiFlash^TM is officially open source in this release. For details, see TiFlash repository.

Release strategy changes

Starting from TiDB v6.0.0, TiDB provides two types of releases:

Long-Term Support Releases
Long-Term Support (LTS) releases are released approximately every six months. An LTS release introduces new features and improvements, and accepts patch releases within its release lifecycle. For example, v6.1.0 will be an LTS release.
Development Milestone Releases
Development Milestone Releases (DMR) are released approximately every two months. A DMR introduces new features and improvements, but does not accept patch releases. It is not recommended for users to use DMR in production environments. For example, v6.0.0-DMR is a DMR.

TiDB v6.0.0 is a DMR, and its version is 6.0.0-DMR.

New features

SQL

SQL-based placement rules for data
TiDB is a distributed database with excellent scalability. Usually, data is deployed across multiple servers or even multiple data centers. Therefore, data scheduling management is one of the most important basic capabilities of TiDB. In most cases, users do not need to care about how to schedule and manage data. However, with the increasing application complexity, deployment changes caused by isolation and access latency have become new challenges for TiDB. Since v6.0.0, TiDB officially provides data scheduling and management capabilities based on SQL interfaces. It supports flexible scheduling and management in dimensions such as replica counts, role types, and placement locations for any data. TiDB also supports more flexible management for data placement in multi-service shared clusters and cross-AZ deployments.
User document
Support building TiFlash replicas by databases. To add TiFlash replicas for all tables in a database, you only need to use a single SQL statement, which greatly saves operation and maintenance costs.
User document

Transaction

Add a check for data index consistency at the kernel level
Add a check for data index consistency when a transaction is executed, which improves system stability and robustness, with only very low resource overhead. You can control the check behavior using the tidb_enable_mutation_checker and tidb_txn_assertion_level variables. With the default configuration, the QPS drop is controlled within 2% in most scenarios. For the error description of the consistency check, see user document.

Observability

Top SQL: Performance diagnosis for non-experts
Top SQL is a self-serving database performance monitoring and diagnosis feature in TiDB Dashboard, for DBAs and App developers, which is now generally available in TiDB v6.0.
Unlike existing diagnostic features for experts, Top SQL is designed for non-experts: you do not need to traverse thousands of monitoring charts to find correlations or understand TiDB internal mechanisms such as Raft Snapshot, RocksDB, MVCC, and TSO. To use Top SQL for analyzing database load quickly and improving App performance, only basic database knowledge (such as index, lock conflict, and execution plans) is needed.
Top SQL is not enabled by default. When enabled, Top SQL provides you with the real-time CPU load of each TiKV or TiDB node. Therefore, you can spot SQL statements consuming high CPU loads at first glimpse, and quickly analyze the issues such as database hotspots and sudden load increases. For example, you can use Top SQL to pinpoint and diagnose an unusual query that consumes 90% CPU of a single TiKV node.
User documentation
Support Continuous Profiling
TiDB Dashboard introduces the Continuous Profiling feature, which is now generally available in TiDB v6.0. Continuous profiling is not enabled by default. When enabled, the performance data of individual TiDB, TiKV, and PD instances will be collected all the time, with negligible overhead. With history performance data, technical experts can backtrack and pinpoint the root causes of issues like high memory consumption, even when the issues are difficult to reproduce. In this way, the mean time to recovery (MTTR) can be reduced.
User document

Performance

Cache hotspot small tables
For user applications in scenarios where hotspot small tables are accessed, TiDB supports explicitly caching the hotspot tables in memory, which greatly improves the access performance, improves the throughput, and reduces access latency. This solution can effectively avoid introducing a third-party cache middleware, reduce the complexity of the architecture, and cut the cost of operation and maintenance. The solution is suitable for scenarios where small tables are frequently accessed but rarely updated, such as the configuration tables or exchange rate tables.
User document, #25293
In-memory pessimistic locking
Since TiDB v6.0.0, in-memory pessimistic locking is enabled by default. After enabling this feature, pessimistic transaction locks are managed in memory. This avoids persisting pessimistic locks and the Raft replication of the lock information, and greatly reduces the overhead of managing pessimistic transaction locks. Under the performance bottleneck caused by pessimistic locks, memory optimization for pessimistic locks can effectively reduce latency by 10% and increase QPS by 10%.
User document, #11452
Optimization to get TSO at the Read Committed isolation level
To reduce query latency, when read-write conflicts are rare, TiDB adds the tidb_rc_read_check_ts system variable at the Read Committed isolation level to get less unnecessary TSO. This variable is disabled by default. When the variable is enabled, this optimization avoids getting duplicated TSO to reduce latency in scenarios where there is no read-write conflict. However, in scenarios with frequent read-write conflicts, enabling this variable might cause a performance regression.
User document, #33159
Enhance prepared statements to share execution plans
Reusing SQL execution plans can effectively reduce the time for parsing SQL statements, lessen CPU resource consumption, and improve SQL execution efficiency. One of the important methods of SQL tuning is to reuse SQL execution plans effectively. TiDB has supported sharing execution plans with prepared statements. However, when the prepared statements are closed, TiDB automatically clears the corresponding plan cache. After that, TiDB might unnecessarily parse the repeated SQL statements, affecting the execution efficiency. Since v6.0.0, TiDB supports controlling whether to ignore the COM_STMT_CLOSE command through the tidb_ignore_prepared_cache_close_stmt parameter (disabled by default). When the parameter is enabled, TiDB ignores the command of closing prepared statements and keeps the execution plan in the cache, improving the reuse rate of the execution plan.
User document, #31056
Improve query pushdown
With its native architecture of separating computing from storage, TiDB supports filtering out invalid data by pushing down operators, which greatly reduces the data transmission between TiDB and TiKV and thereby improves the query efficiency. In v6.0.0, TiDB supports pushing down more expressions and the BIT data type to TiKV, improving the query efficiency when computing the expressions and data type.
User document, #30738
Optimization of hotspot index
Writing monotonically increasing data in batches to the secondary index causes an index hotspot and affects the overall write throughput. Since v6.0.0, TiDB supports scattering the index hotspot using the tidb_shard function to improve the write performance. Currently, tidb_shard only takes effect on the unique secondary index. This application-friendly solution does not require modifying the original query conditions. You can use this solution in the scenarios of high write throughput, point queries, and batch point queries. Note that using the data that has been scattered by range queries in the application might cause a performance regression. Therefore, do not use this function in such cases without verification.
User document, #31040
Support dynamic pruning mode for partitioned tables in TiFlash MPP engine (experimental)
In this mode, TiDB can read and compute the data on partitioned tables using the MPP engine of TiFlash, which greatly improves the query performance of partitioned tables.
User document
Improve the computing performance of the MPP engine
- Support pushing down more functions and operators to the MPP engine
  - Logical functions: IS, IS NOT
  - String functions: REGEXP(), NOT REGEXP()
  - Mathematical functions: GREATEST(int/real), LEAST(int/real)
  - Date functions: DAYNAME(), DAYOFMONTH(), DAYOFWEEK(), DAYOFYEAR(), LAST_DAY(), MONTHNAME()
  - Operators: Anti Left Outer Semi Join, Left Outer Semi Join
    User document
- The elastic thread pool (enabled by default) becomes GA. This feature aims to improve CPU utilization.
  User document

Stability

Enhance baseline capturing of execution plans
Enhance the usability of baseline capturing of execution plans by adding a blocklist with such dimensions as table name, frequency, and user name. Introduce a new algorithm to optimize memory management for caching bindings. After baseline capturing is enabled, the system automatically creates bindings for most OLTP queries. Execution plans of bound statements are fixed, avoiding performance problems due to any change in the execution plans. Baseline capturing is applicable to scenarios such as major version upgrades and cluster migration, and helps reduce performance problems caused by regression of execution plans.
User document, #32466
Support TiKV quota limiter (experimental)
If your machine deployed with TiKV has limited resources and the foreground is burdened by an excessively large amount of requests, background CPU resources are occupied by the foreground, causing TiKV performance unstable. In TiDB v6.0.0, you can use the quota-related configuration items to limit the resources used by the foreground, including CPU and read/write bandwidth. This greatly improves stability of clusters under long-term heavy workloads.
User document, #12131
Support the zstd compression algorithm in TiFlash
TiFlash introduces two parameters, profiles.default.dt_compression_method and profiles.default.dt_compression_level, which allow users to select the optimal compression algorithm based on performance and capacity balance.
User document
Enable all I/O checks (Checksum) by default
This feature was introduced in v5.4.0 as experimental. It enhances data accuracy and security without imposing an obvious impact on users' businesses.
Warning: Newer version of data format cannot be downgraded in place to versions earlier than v5.4.0. During such a downgrade, you need to delete TiFlash replicas and replicate data after the downgrade. Alternatively, you can perform a downgrade by referring to dttool migrate.
User document
Improve thread utilization
TiFlash introduces asynchronous gRPC and Min-TSO scheduling mechanisms. Such mechanisms ensure more efficient use of threads and avoid system crashes caused by excessive threads.
User document

Data migration

TiDB Data Migration (DM)

Add WebUI (experimental)
With the WebUI, you can easily manage a large number of migration tasks. On the WebUI, you can:
- View migration tasks on Dashboard
- Manage migration tasks
- Configure upstream settings
- Query replication status
- View master and worker information
  WebUI is still experimental and is still under development. Therefore, it is recommended only for trial. A known issue is that problems might occur if you use WebUI and dmctl to operate the same task. This issue will be resolved in later versions.
  User document
Add an error handling mechanism
More commands are introduced to address problems that interrupt a migration task. For example:
- In case of a schema error, you can update the schema file by using the --from-source/--from-target parameter of the binlog-schema update command, instead of editing the schema file separately.
- You can specify a binlog position to inject, replace, skip, or revert a DDL statement.
  User document
Support full data storage to Amazon S3
When DM performs all or full data migration tasks, sufficient hard disk space is required for storing full data from upstream. Compared with EBS, Amazon S3 has nearly infinite storage at lower costs. Now, DM supports configuring Amazon S3 as the dump directory. That means you can use S3 to store full data when you perform all or full data migration tasks.
User document
Support starting a migration task from specified time
A new parameter --start-time is added to migration tasks. You can define time in the format of '2021-10-21 00:01:00' or '2021-10-21T00:01:00'.
This feature is particularly useful in scenarios where you migrate and merge incremental data from shard mysql instances. Specifically, you do not need to set a binlog start point for each source in an incremental migration task. Instead, you can create an incremental migration task quickly by using the --start-time parameter in safe-mode.
User document

TiDB Lightning

Support configuring the maximum number of tolerable errors
Added a configuration item lightning.max-error. The default value is 0. When the value is greater than 0, the max-error feature is enabled. If an error occurs in a row during encoding, a record containing this row is added to lightning_task_info.type_error_v1 in the target TiDB and this row is ignored. When rows with errors exceed the threshold, TiDB Lightning exits immediately.
Matching the lightning.max-error configuration, the lightning.task-info-schema-name configuration item records the name of the database that reports a data saving error.
This feature does not cover all types of errors, for example, syntax errors are not applicable.
User document

Support replicating 100,000 tables simultaneously
By optimizing the data processing flow, TiCDC reduces the resource consumption of processing incremental data for each table, which greatly improves the replication stability and efficiency when replicating data in large clusters. The result of an internal test shows that TiCDC can stably support replicating 100,000 tables simultaneously.

Deployment and maintenance

Enable new collation rules by default
Since v4.0, TiDB has supported new collation rules that behave the same way as MySQL in the case-insensitive, accent-insensitive, and padding rules. The new collation rules are controlled by the new_collations_enabled_on_first_bootstrap parameter, which was disabled by default. Since v6.0, TiDB enables the new collation rules by default. Note that this configuration takes effect only upon TiDB cluster initialization.
User documentation
Accelerate leader balancing after restarting TiKV nodes
After a restart of TiKV nodes, the unevenly scattered leaders must be redistributed for load balance. In large-scale clusters, leader balancing time is positively correlated with the number of Regions. For example, the leader balancing of 100K Regions can take 20-30 minutes, which is prone to performance issues and stability risks due to uneven load. TiDB v6.0.0 provides a parameter to control the balancing concurrency and enlarges the default value to 4 times of the original, which greatly shortens the leader rebalancing time and accelerates the business recovery after a restart of the TiKV nodes.
User documentation, #4610
Support canceling the automatic update of statistics
Statistics are one of the most important basic data that affect SQL performance. To ensure the completeness and timeliness of statistics, TiDB automatically updates object statistics periodically in the background. However, automatic statistics updates may result in resource contention, affecting SQL performance. To address this issue, you can manually cancel the automatic update of statistics since v6.0.
User documentation
PingCAP Clinic diagnostic service (Technical Preview version)
PingCAP Clinic is a diagnostic service for TiDB clusters. This service helps troubleshoot cluster issues remotely and provides a quick check of cluster status locally. With PingCAP Clinic, you can ensure the stable operation of your TiDB cluster during its full life cycle, predict potential issues, reduce the probability of issues, and quickly troubleshoot cluster issues.
When contacting PingCAP technical support for remote assistance to troubleshoot cluster issues, you can use the PingCAP Clinic service to collect and upload diagnostic data, thereby improving the troubleshooting efficiency.
User documentation
An enterprise-level database management platform, TiDB Enterprise Manager
TiDB Enterprise Manager (TiEM) is an enterprise-level database management platform based on the TiDB database, which aims to help users manage TiDB clusters in self-hosted or public cloud environments.
TiEM not only provides full lifecycle visual management for TiDB clusters, but also provides one-stop services: parameter management, version upgrades, cluster clone, active-standby cluster switching, data import and export, data replication, and data backup and restore services. TiEM can improve the efficiency of DevOps on TiDB and reduce the DevOps cost for enterprises.
Currently, TiEM is provided in the TiDB Enterprise edition only. To get TiEM, contact us via the TiDB Enterprise page.
Support customizing configurations of the monitoring components
When you deploy a TiDB cluster using TiUP, TiUP automatically deploys monitoring components such as Prometheus, Grafana, and Alertmanager, and automatically adds new nodes into the monitoring scope after scale-out. You can customize the configurations of the monitoring components by adding configuration items to the topology.yaml file.
User document

Compatibility changes