TiDB 8.4.0 Release Notes
Release date: November 11, 2024
TiDB version: 8.4.0
Quick access: Quick start
8.4.0 introduces the following key features and improvements:
Category | Feature/Enhancement | Description |
---|---|---|
Scalability and Performance | Instance-level execution plan cache (experimental) | Instance-level plan cache allows all sessions within the same TiDB instance to share the plan cache. Compared with session-level plan cache, this feature reduces SQL compilation time by caching more execution plans in memory, decreasing overall SQL execution time. It improves OLTP performance and throughput while providing better control over memory usage and enhancing database stability. |
Global indexes for partitioned tables (GA) | Global indexes can effectively improve the efficiency of retrieving non-partitioned columns, and remove the restriction that a unique key must contain the partition key. This feature extends the usage scenarios of TiDB partitioned tables, and avoids some of the application modification work required for data migration. | |
Parallel mode for TSO requests | In high-concurrency scenarios, you can use this feature to reduce the wait time for retrieving TSO and improve the cluster throughput. | |
Improve query performance for cached tables | Improve query performance for index scanning on cached tables, with improvements of up to 5.4 times in some scenarios. For high-speed queries on small tables, using cached tables can significantly enhance overall performance. | |
Reliability and Availability | Support more triggers for runaway queries, and support switching resource groups | Runaway Queries offer an effective way to mitigate the impact of unexpected SQL performance issues on systems. TiDB v8.4.0 introduces the number of keys processed by the Coprocessor (PROCESSED_KEYS ) and request units (RU ) as identifying conditions, and puts identified queries into the specified resource group for more precise identification and control of runaway queries. |
Support setting the maximum limit on resource usage for background tasks of resource control | By setting a maximum percentage limit on background tasks of resource control, you can control their resource consumption based on the needs of different application systems. This keeps background task consumption at a low level and ensures the quality of online services. | |
TiProxy supports traffic capture and replay (experimental) | Use TiProxy to capture real workloads from TiDB production clusters before major operations such as cluster upgrades, migrations, or deployment changes. Replay these workloads on target test clusters to validate performance and ensure successful changes. | |
Concurrent automatic statistics collection | You can set the concurrency within a single automatic statistics collection task using the system variable tidb_auto_analyze_concurrency . TiDB automatically determines the concurrency of scanning tasks based on node scale and hardware specifications. This improves statistics collection efficiency by fully utilizing system resources, reduces manual tuning, and ensures stable cluster performance. | |
SQL | Vector search (experimental) | Vector search is a search method based on data semantics, which provides more relevant search results. As one of the core functions of AI and large language models (LLMs), vector search can be used in various scenarios such as Retrieval-Augmented Generation (RAG), semantic search, and recommendation systems. |
DB Operations and Observability | Display TiKV and TiDB CPU times in memory tables | The CPU time is now integrated into a system table, displayed alongside other metrics for sessions or SQL, letting you observe high CPU consumption operations from multiple perspectives, and improving diagnostic efficiency. This is especially useful for diagnosing scenarios such as CPU spikes in instances or read/write hotspots in clusters. |
Support viewing aggregated TiKV CPU time by table or database | When hotspot issues are not caused by individual SQL statements, using the aggregated CPU time by table or database level in Top SQL can help you quickly identify the tables or applications responsible for the hotspots, significantly improving the efficiency of diagnosing hotspot and CPU consumption issues. | |
Support backing up TiKV instances with IMDSv2 service enabled | AWS EC2 now uses IMDSv2 as the default metadata service. TiDB supports backing up data from TiKV instances that have IMDSv2 enabled, helping you run TiDB clusters more effectively in public cloud services. | |
Security | Client-side encryption of log backup data (experimental) | Before uploading log backup data to your backup storage, you can encrypt the backup data to ensure its security during storage and transmission. |
Feature details
Performance
Introduce parallel batching modes for TSO requests, reducing TSO retrieval latency #54960 #8432 @MyonKeminta
Before v8.4.0, when requesting TSO from PD, TiDB collects multiple TSO requests during a specific period and processes them in batches serially to decrease the number of Remote Procedure Call (RPC) requests and reduce PD workload. In latency-sensitive scenarios, however, the performance of this serial batching mode is not ideal.
In v8.4.0, TiDB introduces parallel batching modes for TSO requests with different concurrency capabilities. Parallel modes reduce TSO retrieval latency but might increase the PD workload. To set a parallel RPC mode for retrieving TSO, configure the
tidb_tso_client_rpc_mode
system variable.For more information, see documentation.
Optimize the execution efficiency of the hash join operator for TiDB (experimental) #55153 #53127 @windtalker @xzhangxian1008 @XuHuaiyu @wshwsh12
In v8.4.0, TiDB introduces an optimized version of the hash join operator to improve its execution efficiency. Currently, the optimized version of the hash join applies only to inner join and outer join operations and is disabled by default. To enable this optimized version, configure the
tidb_hash_join_version
system variable tooptimized
.For more information, see documentation.
Support pushing down the following date functions to TiKV #56297 #17529 @gengliqi
DATE_ADD()
DATE_SUB()
ADDDATE()
SUBDATE()
For more information, see documentation.
Support instance-level execution plan cache (experimental) #54057 @qw4990
Instance-level execution plan cache allows all sessions within the same TiDB instance to share the execution plan cache. This feature significantly reduces TiDB query response time, increases cluster throughput, decreases the possibility of execution plan mutations, and maintains stable cluster performance. Compared with session-level execution plan cache, instance-level execution plan cache offers the following advantages:
- Eliminates redundancy, caching more execution plans with the same memory consumption.
- Allocates a fixed-size memory on the instance, limiting memory usage more effectively.
In v8.4.0, instance-level execution plan cache only supports caching query execution plans and is disabled by default. You can enable this feature using
tidb_enable_instance_plan_cache
and set its maximum memory usage usingtidb_instance_plan_cache_max_size
. Before enabling this feature, disable Prepared execution plan cache and Non-prepared execution plan cache.For more information, see documentation.
TiDB Lightning's logical import mode supports prepared statements and client statement cache #54850 @dbsid
By enabling the
logical-import-prep-stmt
configuration item, the SQL statements executed in TiDB Lightning's logical import mode will use prepared statements and client statement cache. This reduces the cost of TiDB SQL parsing and compilation, improves SQL execution efficiency, and increases the likelihood of hitting the execution plan cache, thereby speeding up logical import.For more information, see documentation.
Partitioned tables support global indexes (GA) #45133 @mjonss @Defined2014 @jiyfhust @L-maple
In early TiDB versions, the partitioned table has some limitations because it does not support global indexes. For example, the unique key must use every column in the table's partition expression. If the query condition does not use the partition key, the query will scan all partitions, resulting in poor performance. Starting from v7.6.0, the system variable
tidb_enable_global_index
is introduced to enable the global index feature. But this feature was under development at that time and it is not recommended to enable it.Starting from v8.3.0, the global index feature is released as an experimental feature. You can explicitly create a global index for a partitioned table with the
GLOBAL
keyword. This removes the restriction that a unique key in a partitioned table must include all columns used in the partition expression, allowing for more flexible application requirements. Additionally, global indexes also improve the performance of queries based on non-partitioned columns.In v8.4.0, this feature becomes generally available (GA). You can use the keyword
GLOBAL
to create a global index, instead of setting the system variabletidb_enable_global_index
to enable the global index feature. Starting from v8.4.0, this system variable is deprecated and is alwaysON
.For more information, see documentation.
Improve query performance for cached tables in some scenarios #43249 @tiancaiamao
In v8.4.0, TiDB improves the query performance of cached tables by up to 5.4 times when executing
SELECT ... LIMIT 1
withIndexLookup
. In addition, TiDB improves the performance ofIndexLookupReader
in full table scan and primary key query scenarios.
Reliability
Runaway queries support the number of processed keys and request units as thresholds #54434 @HuSharp
Starting from v8.4.0, TiDB can identify runaway queries based on the number of processed keys (
PROCESSED_KEYS
) and request units (RU
). Compared with execution time (EXEC_ELAPSED
), these new thresholds more accurately define the resource consumption of queries, avoiding identification bias when overall performance decreases.You can set multiple conditions simultaneously, and a query is identified as a runaway query if any condition is met.
You can observe the corresponding fields (
RESOURCE_GROUP
,MAX_REQUEST_UNIT_WRITE
,MAX_REQUEST_UNIT_READ
,MAX_PROCESSED_KEYS
) in the Statement Summary Tables to determine the condition values based on historical execution.For more information, see documentation.
Support switching resource groups for runaway queries #54434 @JmPotato
Starting from TiDB v8.4.0, you can switch the resource group of runaway queries to a specific one. If the
COOLDOWN
mechanism fails to lower resource consumption, you can create a resource group, limit its resource size, and set theSWITCH_GROUP
parameter to move identified runaway queries to this group. Meanwhile, subsequent queries within the same session will continue to execute in the original resource group. By switching resource groups, you can manage resource usage more precisely, and control the resource consumption more strictly.For more information, see documentation.
Support setting the cluster-level Region scattering strategy using the
tidb_scatter_region
system variable #55184 @D3HunterBefore v8.4.0, the
tidb_scatter_region
system variable can only be enabled or disabled. When it is enabled, TiDB applies a table-level scattering strategy during batch table creation. However, when creating hundreds of thousands of tables in a batch, this strategy results in a concentration of Regions in a few TiKV nodes, causing OOM (Out of Memory) issues in those nodes.Starting from v8.4.0,
tidb_scatter_region
is changed to the string type. It now supports a cluster-level scattering strategy, which can help avoid TiKV OOM issues in the preceding scenario.For more information, see documentation.
Support setting the maximum limit on resource usage for background tasks of resource control #56019 @glorv
TiDB resource control can identify and lower the priority of background tasks. In certain scenarios, you might want to limit the resource consumption of background tasks, even when resources are available. Starting from v8.4.0, you can use the
UTILIZATION_LIMIT
parameter to set the maximum percentage of resources that background tasks can consume. Each node will keep the resource usage of all background tasks below this percentage. This feature enables precise control over resource consumption for background tasks, further enhancing cluster stability.For more information, see documentation.
Optimize the resource allocation strategy of resource groups #50831 @nolouch
TiDB improves the resource allocation strategy in v8.4.0 to better meet user expectations for resource management.
- Controlling the resource allocation of large queries at runtime to avoid exceeding the resource group limit, combined with runaway queries
COOLDOWN
. This can help identify and reduce the concurrency of large queries, and reduce instantaneous resource consumption. - Adjusting the default priority scheduling strategy. When tasks of different priorities run simultaneously, high-priority tasks receive more resources.
- Controlling the resource allocation of large queries at runtime to avoid exceeding the resource group limit, combined with runaway queries
Availability
TiProxy supports traffic replay (experimental) #642 @djshow832
Starting from TiProxy v1.3.0, you can use
tiproxyctl
to connect to the TiProxy instance, capture access traffic in a TiDB production cluster, and replay it in a test cluster at a specified rate. This feature enables you to reproduce actual workloads from the production cluster in a test environment, verifying SQL statement execution results and performance.Traffic replay is useful in the following scenarios:
- Verify TiDB version upgrades
- Assess change impact
- Validate performance before scaling TiDB
- Test performance limits
For more information, see documentation.
SQL
Support vector search (experimental) #54245 #17290 #9032 @breezewish @Lloyd-Pottiger @EricZequan @zimulala @JaySon-Huang @winoros @wk989898
Vector search is a search method based on data semantics, which provides more relevant search results. As one of the core functions of AI and large language models (LLMs), vector search can be used in various scenarios such as Retrieval-Augmented Generation (RAG), semantic search, and recommendation systems.
Starting from v8.4.0, TiDB supports vector data types and vector search indexes, offering powerful vector search capabilities. TiDB vector data types support up to 16,383 dimensions and support various distance functions, including L2 distance (Euclidean distance), cosine distance, negative inner product, and L1 distance (Manhattan distance).
To start vector search, you only need to create a table with vector data types, insert vector data, and then perform a query of vector data. You can also perform mixed queries of vector data and traditional relational data.
To enhance the performance of vector search, you can create and use vector search indexes. Note that TiDB vector search indexes rely on TiFlash. Before using vector search indexes, make sure that TiFlash nodes are deployed in your TiDB cluster.
For more information, see documentation.
DB operations
BR supports client-side encryption of log backup data (experimental) #55834 @Tristan1900
In earlier TiDB versions, only snapshot backup data can be encrypted on the client side. Starting from v8.4.0, log backup data can also be encrypted on the client side. Before uploading log backup data to your backup storage, you can encrypt the backup data to ensure its security via one of the following methods:
- Encrypt using a custom fixed key
- Encrypt using a master key stored on a local disk
- Encrypt using a master key managed by a Key Management Service (KMS)
For more information, see documentation.
BR requires fewer privileges when restoring backup data in a cloud storage system #55870 @Leavrth
Before v8.4.0, BR writes checkpoint information about the restore progress to the backup storage system during restore. These checkpoints enable quick resumption of interrupted restores. Starting from v8.4.0, BR writes restore checkpoint information to the target TiDB cluster instead. This means that BR only requires read access to the backup directories during restore.
For more information, see documentation.
Observability
Display the CPU time consumed by TiDB and TiKV in the system table #55542 @yibin87
The Top SQL page of TiDB Dashboard displays SQL statements with high CPU consumption. Starting from v8.4.0, TiDB adds CPU time consumption information to the system table, presented alongside other metrics for sessions or SQL, making it easier to observe high CPU consumption operations from multiple perspectives. This information can help you quickly identify the causes of issues in scenarios like instance CPU spikes or read/write hotspots in clusters.
- The statement summary tables add
AVG_TIDB_CPU_TIME
andAVG_TIKV_CPU_TIME
, showing the average CPU time consumed by individual SQL statements historically. - The INFORMATION_SCHEMA.PROCESSLIST table adds
TIDB_CPU
andTIKV_CPU
, showing the cumulative CPU consumption of the SQL statements currently being executed in a session. - The slow query log adds the
Tidb_cpu_time
andTikv_cpu_time
fields, showing the CPU time consumed by captured SQL statements.
By default, the CPU time consumed by TiKV is displayed. Collecting the CPU time consumed by TiDB brings additional overhead (about 8%), so the CPU time consumed by TiDB only shows the actual value when Top SQL is enabled; otherwise, it always shows as
0
.For more information, see
INFORMATION_SCHEMA.PROCESSLIST
andINFORMATION_SCHEMA.SLOW_QUERY
.- The statement summary tables add
Top SQL supports viewing aggregated CPU time results by table or database #55540 @nolouch
Before v8.4.0, Top SQL aggregates CPU time by SQL. If CPU time is not consumed by a few SQL statements, aggregation by SQL cannot effectively identify issues. Starting from v8.4.0, you can choose to aggregate CPU time By TABLE or By DB. In scenarios with multiple systems, the new aggregation method can more effectively identify load changes from a specific system, improving diagnostic efficiency.
For more information, see documentation.
Security
BR supports AWS IMDSv2 #16443 @pingyu
When deploying TiDB on Amazon EC2, BR supports AWS Instance Metadata Service Version 2 (IMDSv2). You can configure your EC2 instance to allow BR to use the IAM role associated with the instance for appropriate permissions to access Amazon S3.
For more information, see documentation.
Data migration
TiCDC Claim-Check supports sending only the
value
field of Kafka messages to external storage #11396 @3AceShowHandBefore v8.4.0, when the Claim-Check feature is enabled (by setting
large-message-handle-option
toclaim-check
), TiCDC encodes and stores both thekey
andvalue
fields in the external storage system when handling large messages.Starting from v8.4.0, TiCDC supports sending only the
value
field of Kafka messages to external storage. This feature is only applicable to non-Open Protocol protocols. You can control this feature by setting theclaim-check-raw-value
parameter.For more information, see documentation.
TiCDC introduces Checksum V2 to verify old values in Update or Delete events #10969 @3AceShowHand
Starting from v8.4.0, TiDB and TiCDC introduce the Checksum V2 algorithm to address issues of Checksum V1 in verifying old values in Update or Delete events after
ADD COLUMN
orDROP COLUMN
operations. For clusters created in v8.4.0 or later, or clusters upgraded to v8.4.0, TiDB uses Checksum V2 by default when single-row data checksum verification is enabled. TiCDC supports handling both Checksum V1 and V2. This change only affects TiDB and TiCDC internal implementation and does not affect checksum calculation methods for downstream Kafka consumers.For more information, see documentation.
Compatibility changes
System variables
Variable name | Change type | Description |
---|---|---|
log_bin | Deleted | In v8.4.0, TiDB Binlog is removed. This variable indicates whether TiDB Binlog is used, and is deleted starting from v8.4.0. |
sql_log_bin | Deleted | In v8.4.0, TiDB Binlog is removed. This variable indicates whether to write changes to TiDB Binlog or not, and is deleted starting from v8.4.0. |
tidb_enable_global_index | Deprecated | In v8.4.0, this variable is deprecated. Its value will be fixed to the default value ON , that is, global index is enabled by default. You only need to add the keyword GLOBAL to the corresponding column when executing CREATE TABLE or ALTER TABLE to create a global index. |
tidb_enable_list_partition | Deprecated | In v8.4.0, this variable is deprecated. Its value will be fixed to the default value ON , that is, list partitioning is enabled by default. |
tidb_enable_table_partition | Deprecated | In v8.4.0, this variable is deprecated. Its value will be fixed to the default value ON , that is, table partitioning is enabled by default. |
tidb_analyze_partition_concurrency | Modified | Changes the value range from [1, 18446744073709551615] to [1, 128] . |
tidb_enable_inl_join_inner_multi_pattern | Modified | Changes the default value from OFF to ON . Starting from v8.4.0, Index Join is supported by default when the inner table has Selection , Aggregation , or Projection operators on it. |
tidb_opt_prefer_range_scan | Modified | Changes the default value from OFF to ON . For tables with no statistics (pseudo-statistics) or empty tables (zero statistics), the optimizer prefers interval scans over full table scans. |
tidb_scatter_region | Modified | Before v8.4.0, its type is boolean, it only supports ON and OFF , and the Region of the newly created table only supports table level scattering after it is enabled. Starting from v8.4.0, the SESSION scope is added, the type is changed from boolean to enumeration, the default value is changed from OFF to null, and the optional values TABLE and GLOBAL are added. In addition, it now supports cluster-level scattering policy to avoid the TiKV OOM issues caused by uneven distribution of regions during fast table creation in batches. |
tidb_schema_cache_size | Modified | Changes the default value from 0 to 536870912 (512 MiB), indicating that this feature is enabled by default. The minimum value allowed is set to 67108864 (64 MiB). |
tidb_auto_analyze_concurrency | Newly added | Sets the concurrency within a single automatic statistics collection task. Before v8.4.0, this concurrency is fixed at 1 . To speed up statistics collection tasks, you can increase this concurrency based on your cluster's available resources. |
tidb_enable_instance_plan_cache | Newly added | Controls whether to enable the Instance Plan Cache feature. |
tidb_enable_stats_owner | Newly added | Controls whether the corresponding TiDB instance can run automatic statistics update tasks. |
tidb_hash_join_version | Newly added | Controls whether TiDB uses an optimized version of the Hash Join operator. The default value of legacy means that the optimized version is not used. If you set it to optimized , TiDB uses the optimized version of the Hash Join operator when executing it to improve Hash Join performance. |
tidb_instance_plan_cache_max_size | Newly added | Sets the maximum memory usage for Instance Plan Cache. |
tidb_instance_plan_cache_reserved_percentage | Newly added | Controls the percentage of idle memory reserved for Instance Plan Cache after memory eviction. |
tidb_pre_split_regions | Newly added | Before v8.4.0, setting the default number of row split slices for newly created tables required declaring PRE_SPLIT_REGIONS in each CREATE TABLE SQL statement, which is complicated once a large number of tables need to be similarly configured. This variable is introduced to solve such problems. You can set this system variable at the GLOBAL or SESSION level to improve usability. |
tidb_shard_row_id_bits | Newly added | Before v8.4.0, setting the default number of slices for row IDs for newly created tables required declaring SHARD_ROW_ID_BITS in each CREATE TABLE or ALTER TABLE SQL statement, which is complicated once a large number of tables need to be similarly configured. This variable is introduced to solve such problems. You can set this system variable at the GLOBAL or SESSION level to improve usability. |
tidb_tso_client_rpc_mode | Newly added | Switches the mode in which TiDB sends TSO RPC requests to PD. The mode determines whether TSO RPC requests can be processed in parallel and affects the time spent on batch-waiting for each TS retrieval operation, thereby helping reduce the wait time for retrieving TS during the execution of queries in certain scenarios. |
Configuration parameters
Configuration file or component | Configuration parameter | Change type | Description |
---|---|---|---|
TiDB | grpc-keepalive-time | Modified | Adds the minimum value of 1 . |
TiDB | grpc-keepalive-timeout | Modified | Before v8.4.0, the data type of this parameter is INT, and the minimum value is 1 . Starting from v8.4.0, the data type is changed to FLOAT64, and the minimum value becomes 0.05 . In scenarios where network jitter occurs frequently, you can reduce the impact of network jitter on performance by setting a smaller value to shorten the retry interval. |
TiDB | tidb_enable_stats_owner | Newly added | Controls whether the corresponding TiDB instance can run automatic statistics update tasks. |
TiKV | region-split-keys | Modified | Changes the default value from "960000" to "2560000" . |
TiKV | region-split-size | Modified | Changes the default value from "96MiB" to "256MiB" . |
TiKV | sst-max-size | Modified | Changes the default value from "144MiB" to "384MiB" . |
TiKV | pessimistic-txn.in-memory-instance-size-limit | Newly added | Controls the memory usage limit for in-memory pessimistic locks in a TiKV instance. When this limit is exceeded, TiKV writes pessimistic locks persistently. |
TiKV | pessimistic-txn.in-memory-peer-size-limit | Newly added | Controls the memory usage limit for in-memory pessimistic locks in a Region. When this limit is exceeded, TiKV writes pessimistic locks persistently. |
TiKV | raft-engine.spill-dir | Newly added | Controls the secondary directory where TiKV instances store Raft log files for supporting multi-disk storage of Raft log files. |
TiKV | resource-control.priority-ctl-strategy | Newly added | Controls the management policies for low priority tasks. TiKV ensures that higher priority tasks are executed first by adding flow control to low priority tasks. |
PD | cert-allowed-cn | Modified | Starting from v8.4.0, configuring multiple Common Names is supported. Before v8.4.0, only one Common Name can be set. |
PD | max-merge-region-keys | Modified | Changes the default value from 200000 to 540000 . |
PD | max-merge-region-size | Modified | Changes the default value from 20 to 54 . |
TiFlash | storage.format_version | Modified | Changes the default TiFlash storage format version from 5 to 7 to support vector index creation and storage. Due to this format change, TiFlash clusters upgraded to v8.4.0 or a later version do not support in-place downgrading to earlier versions. |
TiDB Binlog | --enable-binlog | Deleted | In v8.4.0, TiDB Binlog is removed. This parameter controls whether to enable TiDB binlog generation or not, and is deleted starting from v8.4.0. |
TiCDC | claim-check-raw-value | Newly added | Controls whether TiCDC sends only the value field of Kafka messages to external storage. This feature is only applicable to non-Open Protocol scenarios. |
TiDB Lightning | logical-import-prep-stmt | Newly added | In Logical Import Mode, this parameter controls whether to use prepared statements and statement cache to improve performance. The default value is false . |
BR | --log.crypter.key | Newly added | Specifies the encryption key in hexadecimal string format for log backup data. It is a 128-bit (16 bytes) key for the algorithm aes128-ctr , a 24-byte key for the algorithm aes192-ctr , and a 32-byte key for the algorithm aes256-ctr . |
BR | --log.crypter.key-file | Newly added | Specifies the key file for log backup data. You can directly pass in the file path where the key is stored as a parameter without passing in the crypter.key . |
BR | --log.crypter.method | Newly added | Specifies the encryption algorithm for log backup data, which can be aes128-ctr , aes192-ctr , or aes256-ctr . The default value is plaintext , indicating that data is not encrypted. |
BR | --master-key | Newly added | Specifies the master key for log backup data. It can be a master key stored on a local disk or a master key managed by a cloud Key Management Service (KMS). |
BR | --master-key-crypter-method | Newly added | Specifies the encryption algorithm based on the master key for log backup data, which can be aes128-ctr , aes192-ctr , or aes256-ctr . The default value is plaintext , indicating that data is not encrypted. |
Offline package changes
Starting from v8.4.0, the following contents are removed from the TiDB-community-toolkit
binary package:
pump-{version}-linux-{arch}.tar.gz
drainer-{version}-linux-{arch}.tar.gz
binlogctl
arbiter
Removed features
The following features are removed starting from v8.4.0:
- In v8.4.0, TiDB Binlog is removed. Starting from v8.3.0, TiDB Binlog is fully deprecated. For incremental data replication, use TiCDC instead. For point-in-time recovery (PITR), use PITR. Before you upgrade your TiDB cluster to v8.4.0 or later versions, be sure to switch to TiCDC and PITR.
The following features are planned for removal in future versions:
- Starting from v8.0.0, TiDB Lightning deprecates the old version of conflict detection strategy for the physical import mode, and enables you to control the conflict detection strategy for both logical and physical import modes via the
conflict.strategy
parameter. Theduplicate-resolution
parameter for the old version of conflict detection will be removed in a future release.
- Starting from v8.0.0, TiDB Lightning deprecates the old version of conflict detection strategy for the physical import mode, and enables you to control the conflict detection strategy for both logical and physical import modes via the
Deprecated features
The following features are planned for deprecation in future versions:
- TiDB introduces the system variable
tidb_enable_auto_analyze_priority_queue
, which controls whether priority queues are enabled to optimize the ordering of tasks that automatically collect statistics. In future releases, the priority queue will be the only way to order tasks for automatically collecting statistics, so this system variable will be deprecated. - TiDB introduces the system variable
tidb_enable_async_merge_global_stats
in v7.5.0. You can use it to set TiDB to use asynchronous merging of partition statistics to avoid OOM issues. In future releases, partition statistics will be merged asynchronously, so this system variable will be deprecated. - It is planned to redesign the automatic evolution of execution plan bindings in subsequent releases, and the related variables and behavior will change.
- In v8.0.0, TiDB introduces the
tidb_enable_parallel_hashagg_spill
system variable to control whether TiDB supports disk spill for the concurrent HashAgg algorithm. In future versions, thetidb_enable_parallel_hashagg_spill
system variable will be deprecated. - The TiDB Lightning parameter
conflict.max-record-rows
is planned for deprecation in a future release and will be subsequently removed. This parameter will be replaced byconflict.threshold
, which means that the maximum number of conflicting records is consistent with the maximum number of conflicting records that can be tolerated in a single import task. - Starting from v6.3.0, partitioned tables use dynamic pruning mode by default. Compared with static pruning mode, dynamic pruning mode supports features such as IndexJoin and plan cache with better performance. Therefore, static pruning mode will be deprecated.
Improvements
TiDB
- Optimize the efficiency of constructing BatchCop tasks when scanning a large amount of data #55915 #55413 @wshwsh12
- Optimize the transaction's buffer to reduce write latency in transactions and TiDB CPU usage #55287 @you06
- Optimize the execution performance of DML statements when the system variable
tidb_dml_type
is set to"bulk"
#50215 @ekexium - Support using Optimizer Fix Control 47400 to control whether the optimizer limits the minimum value estimated for
estRows
to1
, which is consistent with databases such as Oracle and DB2 #47400 @terry1purcell - Add write control to the
mysql.tidb_runaway_queries
log table to reduce overhead caused by a large number of concurrent writes #54434 @HuSharp - Support Index Join by default when the inner table has
Selection
,Projection
, orAggregation
operators on it #47233 @winoros - Reduce the number of column details fetched from TiKV for
DELETE
operations in certain scenarios, lowering the resource overhead of these operations #38911 @winoros - Support setting the concurrency within a single automatic statistics collection task using the system variable
tidb_auto_analyze_concurrency
#53460 @hawkingrei - Optimize the logic of an internal function to improve performance when querying tables with numerous columns #52112 @Rustin170506
- Simplify filter conditions like
a = 1 AND (a > 1 OR (a = 1 AND b = 2))
toa = 1 AND b = 2
#56005 @ghazalfamilyusa - Increase the cost of table scans in the cost model for scenarios with a high risk of suboptimal execution plans, making the optimizer prefer indexes #56012 @terry1purcell
- TiDB supports the two-argument variant
MID(str, pos)
#52420 @dveeden - Support splitting TTL tasks for tables with non-binary primary keys #55660 @lcwangchao
- Optimize performance of system metadata-related statements #50305 @ywqzzy @tangenta @joechenrh @CbcWestwolf
- Implement a new priority queue for auto-analyze operations to improve analyze performance and reduce the cost of rebuilding the queue #55906 @Rustin170506
- Introduce a DDL notifier to allow the statistics module to subscribe to DDL events #55722 @fzzf678 @lance6716 @Rustin170506
- Force new TiDB nodes to take over DDL ownership during TiDB upgrades to avoid compatibility issues caused by old TiDB nodes taking ownership #51285 @wjhuang2016
- Support cluster-level Scatter Region #8424 @River2000i
TiKV
- Increase the default value of Region from 96 MiB to 256 MiB to avoid the extra overhead caused by too many Regions #17309 @LykxSassinator
- Support setting memory usage limits for in-memory pessimistic locks in a Region or TiKV instance. When hot write scenarios cause a large number of pessimistic locks, you can increase the memory limits via configuration. This helps avoid CPU and I/O overhead caused by pessimistic locks being written to disk. #17542 @cfzjywxk
- Introduce a new
spill-dir
configuration item in Raft Engine, supporting multi-disk storage for Raft logs; when the disk where the home directory (dir
) is located runs out of space, the Raft Engine automatically writes new logs tospill-dir
, ensuring continuous operation of the system #17356 @LykxSassinator - Optimize the compaction trigger mechanism of RocksDB to accelerate disk space reclamation when handling a large number of DELETE versions #17269 @AndreMouche
- Support dynamically modifying flow-control configurations for write operations #17395 @glorv
- Improve the speed of Region Merge in scenarios with empty tables and small Regions #17376 @LykxSassinator
- Prevent Pipelined DML from blocking resolved-ts for long periods #17459 @ekexium
PD
TiFlash
- Optimize the execution efficiency of
LENGTH()
andASCII()
functions #9344 @xzhangxian1008 - Reduce the number of threads that TiFlash needs to create when processing disaggregated storage and compute requests, helping avoid crashes of TiFlash compute nodes when processing a large number of such requests #9334 @JinheLin
- Enhance the task waiting mechanism in the pipeline execution model #8869 @SeaRise
- Improve the cancel mechanism of the JOIN operator, so that the JOIN operator can respond to cancel requests in a timely manner #9430 @windtalker
- Optimize the execution efficiency of
Tools
Backup & Restore (BR)
- Disable splitting Regions by table to improve restore speed when restoring data to a cluster where the
split-table
andsplit-region-on-table
configuration items arefalse
(default value) #53532 @Leavrth - Disable full data restoration to a non-empty cluster using the
RESTORE
SQL statement by default #55087 @BornChanger
- Disable splitting Regions by table to improve restore speed when restoring data to a cluster where the
Bug fixes
TiDB
- Fix the issue that a deadlock might occur when the
tidb_restricted_read_only
variable is set totrue
#53822 #55373 @Defined2014 - Fix the issue that TiDB does not wait for auto-commit transactions to complete during graceful shutdown #55464 @YangKeao
- Fix the issue that reducing the value of
tidb_ttl_delete_worker_count
during TTL job execution makes the job fail to complete #55561 @lcwangchao - Fix the issue that if the index of a table contains generated columns, an
Unknown column 'column_name' in 'expression'
error might occur when collecting statistics for the table via theANALYZE
statement #55438 @hawkingrei - Deprecate unnecessary configurations related to statistics to reduce redundant code #55043 @Rustin170506
- Fix the issue that TiDB might hang or return incorrect results when executing a query containing a correlated subquery and CTE #55551 @guo-shaoge
- Fix the issue that disabling
lite-init-stats
might cause statistics to fail to load synchronously #54532 @hawkingrei - Fix the issue that when an
UPDATE
orDELETE
statement contains a recursive CTE, the statement might report an error or not take effect #55666 @time-and-fate - Fix the issue that a SQL binding containing window functions might not take effect in some cases #55981 @winoros
- Fix the issue that statistics for string columns with non-binary collations might fail to load when initializing statistics #55684 @winoros
- Fix the issue that the optimizer incorrectly estimates the number of rows as 1 when accessing a unique index with the query condition
column IS NULL
#56116 @hawkingrei - Fix the issue that the optimizer does not use the best multi-column statistics information for row count estimation when the query contains filter conditions like
(... AND ...) OR (... AND ...) ...
#54323 @time-and-fate - Fix the issue that the
read_from_storage
hint might not take effect when the query has an available Index Merge execution plan #56217 @AilinKid - Fix the data race issue in
IndexNestedLoopHashJoin
#49692 @solotzg - Fix the issue that the
SUB_PART
value in theINFORMATION_SCHEMA.STATISTICS
table isNULL
#55812 @Defined2014 - Fix the issue that an error occurs when a DML statement contains nested generated columns #53967 @wjhuang2016
- Fix the issue that the integer type of data with minimum display length in the division operation might cause the division result to overflow #55837 @windtalker
- Fix the issue that the operator that follows the TopN operator can not trigger the fallback action when the memory limit is exceeded #56185 @xzhangxian1008
- Fix the issue that the
ORDER BY
column in the Sort operator is stuck if it contains a constant #55344 @xzhangxian1008 - Fix the issue that when adding an index, the
8223 (HY000)
error occurs after killing the PD leader and the data in the table is inconsistent #55488 @tangenta - Fix the issue that too many DDL history jobs cause OOM when you request information about history DDL jobs #55711 @joccau
- Fix the issue that executing
IMPORT INTO
is stuck when Global Sort is enabled and the Region size exceeds 96 MiB #55374 @lance6716 - Fix the issue that executing
IMPORT INTO
on a temporary table causes TiDB to crash #55970 @D3Hunter - Fix the issue that adding a unique index causes the
duplicate entry
error #56161 @tangenta - Fix the issue that TiDB Lightning does not ingest all KV pairs when TiKV is down for more than 810 seconds, resulting in inconsistent data in the table #55808 @lance6716
- Fix the issue that the
CREATE TABLE LIKE
statement can not be used for cached tables #56134 @tiancaiamao - Fix the confusing warning message for
FORMAT()
expressions in CTE #56198 @dveeden - Fix the issue that column type restrictions are inconsistent between
CREATE TABLE
andALTER TABLE
when creating a partitioned table #56094 @mjonss - Fix the incorrect time type in the
INFORMATION_SCHEMA.RUNAWAY_WATCHES
table #54770 @HuSharp
- Fix the issue that a deadlock might occur when the
TiKV
- Fix the issue that prevents master key rotation when the master key is stored in a Key Management Service (KMS) #17410 @hhwyt
- Fix a traffic control issue that might occur after deleting large tables or partitions #17304 @Connor1996
- Fix the issue that TiKV might panic when a stale replica processes Raft snapshots, triggered by a slow split operation and immediate removal of the new replica #17469 @hbisheng
TiFlash
- Fix the issue that TiFlash fails to parse the table schema when the table contains Bit-type columns with a default value that contains invalid characters #9461 @Lloyd-Pottiger
- Fix the issue that TiFlash might panic due to spurious Region overlap check failures that occur when multiple Regions are concurrently applying snapshots #9329 @CalvinNeo
- Fix the issue that some JSON functions unsupported by TiFlash are pushed down to TiFlash #9444 @windtalker
Tools
Backup & Restore (BR)
- Fix the issue that the PITR checkpoint interval in monitoring abnormally increased when TiDB nodes stopped, which does not reflect the actual situation #42419 @YuJuncen
- Fix the issue that backup tasks might get stuck if TiKV becomes unresponsive during the backup process #53480 @Leavrth
- Fix the issue that BR logs might print sensitive credential information when log backup is enabled #55273 @RidRisR
- Fix the issue that after a log backup PITR task fails and you stop it, the safepoints related to that task are not properly cleared in PD #17316 @Leavrth
TiDB Data Migration (DM)
TiDB Lightning
- Fix the issue that TiDB Lightning reports a
verify allocator base failed
error when two instances simultaneously start parallel import tasks and are assigned the same task ID #55384 @ei-sugimoto
- Fix the issue that TiDB Lightning reports a
Contributors
We would like to thank the following contributors from the TiDB community:
- ei-sugimoto
- eltociear
- guoshouyan (First-time contributor)
- JackL9u
- kafka1991 (First-time contributor)
- qingfeng777
- samba-rgb (First-time contributor)
- SeaRise
- tuziemon (First-time contributor)
- xyproto (First-time contributor)