TiDB 7.0.0 Release Notes
Release date: March 30, 2023
TiDB version: 7.0.0-DMR
Quick access: Quick start
In v7.0.0-DMR, the key new features and improvements are as follows:
Category | Feature | Description |
---|---|---|
Scalability and Performance | Session level non-prepared SQL plan cache (experimental) | Support automatically reusing plan cache at the session level to reduce compilation and shorten the query time for the same SQL patterns without manually setting prepare statements in advance. |
TiFlash supports the disaggregated storage and compute architecture and S3 shared storage (experimental) | TiFlash introduces a cloud-native architecture as an option:
| |
Reliability and Availability | Resource control enhancement (experimental) | Support using resource groups to allocate and isolate resources for various applications or workloads within one cluster. In this release, TiDB adds support for different resource binding modes (user, session, and statement levels) and user-defined priorities. Additionally, you can also use commands to perform resource calibration (estimation for the whole resource amount). |
TiFlash supports spill to disk | TiFlash supports intermediate result spill to disk to mitigate OOMs in data-intensive operations such as aggregations, sorts, and hash joins. | |
SQL | Row-level TTL (GA) | Support managing database size and improve performance by automatically expiring data of a certain age. |
Reorganize LIST /RANGE partition | The REORGANIZE PARTITION statement can be used for merging adjacent partitions or splitting one partition into many, which provides better usability of partitioned tables. | |
DB Operations and Observability | TiDB enhances the functionalities of LOAD DATA statements (experimental) | TiDB enhances the functionalities of LOAD DATA SQL statements, such as supporting data import from S3/GCS. |
TiCDC supports object storage sink (GA) | TiCDC supports replicating row change events to object storage services, including Amazon S3, GCS, Azure Blob Storage, and NFS. |
Feature details
Scalability
TiFlash supports the disaggregated storage and compute architecture and supports object storage in this architecture (experimental) #6882 @flowbehappy
Before v7.0.0, TiFlash only supports the coupled storage and compute architecture. In this architecture, each TiFlash node acts as both storage and compute node, and its computing and storage capabilities cannot be independently expanded. In addition, TiFlash nodes can only use local storage.
Starting from v7.0.0, TiFlash also supports the disaggregated storage and compute architecture. In this architecture, TiFlash nodes are divided into two types (Compute Nodes and Write Nodes) and support object storage that is compatible with S3 API. Both types of nodes can be independently scaled for computing or storage capacities. The disaggregated storage and compute architecture and coupled storage and compute architecture cannot be used in the same cluster or converted to each other. You can configure which architecture to use when you deploy TiFlash.
For more information, see documentation.
Performance
Achieve compatibility between Fast Online DDL and PITR #38045 @Leavrth
In TiDB v6.5.0, Fast Online DDL is not fully compatible with PITR. To ensure a full data backup, it is recommended to first stop the PITR background backup task, quickly add indexes using Fast Online DDL, and then resume the PITR backup task.
Starting from TiDB v7.0.0, Fast Online DDL and PITR are fully compatible. When restoring cluster data through PITR, the index operations added via Fast Online DDL during log backup will be automatically replayed to achieve compatibility.
For more information, see documentation.
TiFlash supports null-aware semi join and null-aware anti semi join operators #6674 @gengliqi
When using
IN
,NOT IN
,= ANY
, or!= ALL
operators in correlated subqueries, TiDB optimizes the computing performance by converting them to semi join or anti semi join. If the join key column might beNULL
, a null-aware join algorithm is required, such as Null-aware semi join and Null-aware anti semi join.Before v7.0.0, TiFlash does not support null-aware semi join and null-aware anti semi join operators, preventing these subqueries from being directly pushed down to TiFlash. Starting from v7.0.0, TiFlash supports null-aware semi join and null-aware anti semi join operators. If a SQL statement contains these correlated subqueries, the tables in the query have TiFlash replicas, and MPP mode is enabled, the optimizer automatically determines whether to push down null-aware semi join and null-aware anti semi join operators to TiFlash to improve overall performance.
For more information, see documentation.
TiFlash supports using FastScan (GA) #5252 @hongyunyan
Starting from v6.3.0, TiFlash introduces FastScan as an experimental feature. In v7.0.0, this feature becomes generally available. You can enable FastScan using the system variable
tiflash_fastscan
. By sacrificing strong consistency, this feature significantly improves table scan performance. If the corresponding table only involvesINSERT
operations without anyUPDATE
/DELETE
operations, FastScan can keep strong consistency and improve the scan performance.For more information, see documentation.
TiFlash supports late materialization (experimental) #5829 @Lloyd-Pottiger
When processing a
SELECT
statement with filter conditions (WHERE
clause), TiFlash reads all the data from the columns required by the query by default, and then filters and aggregates the data based on the query conditions. Late materialization is an optimization method that supports pushing down part of the filter conditions to the TableScan operator. That is, TiFlash first scans the column data related to the filter conditions that are pushed down, filters the rows that meet the condition, and then scans the other column data of these rows for further calculation, thereby reducing IO scans and computations of data processing.The TiFlash late materialization feature is not enabled by default. You can enable it by setting the
tidb_opt_enable_late_materialization
system variable toOFF
. When the feature is enabled, the TiDB optimizer will determine which filter conditions to be pushed down based on statistics and filter conditions.For more information, see documentation.
Support caching execution plans for non-prepared statements (experimental) #36598 @qw4990
The execution plan cache is important for improving the load capacity of concurrent OLTP and TiDB already supports Prepared execution plan cache. In v7.0.0, TiDB can also cache execution plans for non-Prepare statements, expanding the scope of execution plan cache and improving the concurrent processing capacity of TiDB.
This feature is disabled by default. You can enable it by setting the system variable
tidb_enable_non_prepared_plan_cache
toON
. For stability reasons, TiDB v7.0.0 allocates a new area for caching non-prepared execution plans and you can set the cache size using the system variabletidb_non_prepared_plan_cache_size
. Additionally, this feature has certain restrictions on SQL statements. For more information, see Restrictions.For more information, see documentation.
TiDB removes the execution plan cache constraint for subqueries #40219 @fzzf678
TiDB v7.0.0 removes the execution plan cache constraint for subqueries. This means that the execution plan of SQL statements with subqueries can now be cached, such as
SELECT * FROM t WHERE a > (SELECT ...)
. This feature further expands the application scope of execution plan cache and improves the execution efficiency of SQL queries.For more information, see documentation.
TiKV supports automatically generating empty log files for log recycling #14371 @LykxSassinator
In v6.3.0, TiKV introduced the Raft log recycling feature to reduce long-tail latency caused by write load. However, log recycling can only take effect when the number of Raft log files reaches a certain threshold, making it difficult for users to directly experience the throughput improvement brought by this feature.
In v7.0.0, a new configuration item called
raft-engine.prefill-for-recycle
was introduced to improve user experience. This item controls whether empty log files are generated for recycling when the process starts. When this configuration is enabled, TiKV automatically fills a batch of empty log files during initialization, ensuring that log recycling takes effect immediately after initialization.For more information, see documentation.
Support deriving the TopN or Limit operator from window functions to improve window function performance #13936 @windtalker
This feature is disabled by default. To enable it, you can set the session variable tidb_opt_derive_topn to
ON
.For more information, see documentation.
Support creating unique indexes through Fast Online DDL #40730 @tangenta
TiDB v6.5.0 supports creating ordinary secondary indexes via Fast Online DDL. TiDB v7.0.0 supports creating unique indexes via Fast Online DDL. Compared to v6.1.0, adding unique indexes to large tables is expected to be several times faster with improved performance.
For more information, see documentation.
Reliability
Enhance the resource control feature (experimental) #38825 @nolouch @BornChanger @glorv @tiancaiamao @Connor1996 @JmPotato @hnes @CabinfeverB @HuSharp
TiDB enhances the resource control feature based on resource groups. This feature significantly improves the resource utilization efficiency and performance of TiDB clusters. The introduction of the resource control feature is a milestone for TiDB. You can divide a distributed database cluster into multiple logical units, map different database users to corresponding resource groups, and set the quota for each resource group as needed. When the cluster resources are limited, all resources used by sessions in the same resource group are limited to the quota. In this way, even if a resource group is over-consumed, the sessions in other resource groups are not affected.
With this feature, you can combine multiple small and medium-sized applications from different systems into a single TiDB cluster. When the workload of an application grows larger, it does not affect the normal operation of other applications. When the system workload is low, busy applications can still be allocated the required system resources even if they exceed the set quotas, so as to achieve the maximum utilization of resources. In addition, the rational use of the resource control feature can reduce the number of clusters, ease the difficulty of operation and maintenance, and save management costs.
This feature provides a built-in Resource Control Dashboard for the actual usage of resources in Grafana, assisting you to allocate resources more rationally. It also supports dynamic resource management capabilities based on both session and statement levels (Hint). The introduction of this feature will help you gain more precise control over the resource usage of your TiDB cluster, and dynamically adjust quotas based on actual needs.
In TiDB v7.0.0, you can set the absolute scheduling priority (
PRIORITY
) for resource groups to guarantee that important services can get resources. It also extends the way to set resource groups.You can use resource groups in the following ways:
- User level. Bind a user using the
CREATE USER
orALTER USER
statements to a specific resource group. After binding a resource group to a user, sessions newly created by the user are automatically bound to the corresponding resource group. - Session level. Set the resource group used by the current session via
SET RESOURCE GROUP
. - Statement level. Set the resource group used by the current statement via
RESOURCE_GROUP()
.
For more information, see documentation.
- User level. Bind a user using the
Support a checkpoint mechanism for Fast Online DDL, improving fault tolerance and automatic recovery capability #42164 @tangenta
TiDB v7.0.0 introduces a checkpoint mechanism for Fast Online DDL, which significantly improves its fault tolerance and automatic recovery capabilities. By periodically recording and synchronizing the DDL progress, ongoing DDL operations can continue to be executed in Fast Online DDL mode even if there is a TiDB DDL Owner failure or switch. This makes the execution of DDL more stable and efficient.
For more information, see documentation.
TiFlash supports spilling to disk #6528 @windtalker
To improve execution performance, TiFlash runs data entirely in memory as much as possible. When the amount of data exceeds the total size of memory, TiFlash terminates the query to avoid system crashes caused by running out of memory. Therefore, the amount of data that TiFlash can handle is limited by the available memory.
Starting from v7.0.0, TiFlash supports spilling to disk. By adjusting the threshold of memory usage for operators (
tidb_max_bytes_before_tiflash_external_group_by
,tidb_max_bytes_before_tiflash_external_sort
, andtidb_max_bytes_before_tiflash_external_join
), you can control the maximum amount of memory that an operator can use. When the memory used by the operator exceeds the threshold, it automatically writes data to disk. This sacrifices some performance but allows for processing of more data.For more information, see documentation.
Improve the efficiency of collecting statistics #41930 @xuyifangreeneyes
In v7.0.0, TiDB further optimizes the logic of collecting statistics, reducing the collection time by about 25%. This optimization improves the operational efficiency and stability of large database clusters, reducing the impact of statistics collection on cluster performance.
Add new optimizer hints for MPP optimization #39710 @Reminiscent
In v7.0.0, TiDB adds a series of optimizer hints to influence the generation of MPP execution plans.
SHUFFLE_JOIN()
: takes effect on MPP. It hints the optimizer to use the Shuffle Join algorithm for the specified table.BROADCAST_JOIN()
: takes effect on MPP. It hints the optimizer to use the Broadcast Join algorithm for the specified table.MPP_1PHASE_AGG()
: takes effect on MPP. It hints the optimizer to use the one-phase aggregation algorithm for all aggregate functions in the specified query block.MPP_2PHASE_AGG()
: takes effect on MPP. It hints the optimizer to use the two-phase aggregation algorithm for all aggregate functions in the specified query block.
MPP optimizer hints can help you intervene in HTAP queries, improving performance and stability for HTAP workloads.
For more information, see documentation.
Optimizer hints support specifying join methods and join orders #36600 @Reminiscent
In v7.0.0, the optimizer hint
LEADING()
can be used in conjunction with hints that affect the join method, and their behaviors are compatible. In the case of multi-table joins, you can effectively specify the optimal join method and join order, thereby enhancing the control of optimizer hints over execution plans.The new hint behavior has minor changes. To ensure forward compatibility, TiDB introduces the system variable
tidb_opt_advanced_join_hint
. When this variable is set toOFF
, the optimizer hint behavior is compatible with earlier versions. When you upgrade your cluster from earlier versions to v7.0.0 or later versions, this variable will be set toOFF
. To obtain more flexible hint behavior, after you confirm that the behavior does not cause a performance regression, it is strongly recommended to set this variable toON
.For more information, see documentation.
Availability
Support the
prefer-leader
option, which provides higher availability for read operations and reduces response latency in unstable network conditions #40905 @LykxSassinatorYou can control TiDB's data reading behavior through the system variable
tidb_replica_read
. In v7.0.0, this variable adds theprefer-leader
option. When the variable is set toprefer-leader
, TiDB prioritizes selecting the leader replica to perform read operations. When the processing speed of the leader replica slows down significantly, such as due to disk or network performance fluctuations, TiDB selects other available follower replicas to perform read operations, providing higher availability and reducing response latency.For more information, see documentation.
SQL
Time to live (TTL) is generally available #39262 @lcwangchao @YangKeao
TTL provides row-level lifecycle control policies. In TiDB, tables with TTL attributes set automatically check and delete expired row data based on the configuration. The goal of TTL is to help users periodically clean up unnecessary data in time while minimizing the impact on cluster workloads.
For more information, see documentation.
Support
ALTER TABLE…REORGANIZE PARTITION
#15000 @mjonssTiDB supports the
ALTER TABLE...REORGANIZE PARTITION
syntax. Using this syntax, you can reorganize some or all of the partitions of a table, including merging, splitting, or other modifications, without losing data.For more information, see documentation.
Support Key partitioning #41364 @TonsnakeLin
Now TiDB supports Key partitioning. Both Key partitioning and Hash partitioning can evenly distribute data into a certain number of partitions. The difference is that Hash partitioning only supports distributing data based on a specified integer expression or an integer column, while Key partitioning supports distributing data based on a column list, and partitioning columns of Key partitioning are not limited to the integer type.
For more information, see documentation.
DB operations
TiCDC supports replicating change data to storage services (GA) #6797 @zhaoxinyu
TiCDC supports replicating changed data to Amazon S3, GCS, Azure Blob Storage, NFS, and other S3-compatible storage services. Storage services are reasonably priced and easy to use. If you are not using Kafka, you can use storage services. TiCDC saves the changed logs to a file and then sends it to the storage services instead. From the storage services, your own consumer program can read the newly generated changed log files periodically. Currently, TiCDC supports replicating changed logs in canal-json and CSV formats to the storage service.
For more information, see documentation.
TiCDC OpenAPI v2 #8019 @sdojjy
TiCDC provides OpenAPI v2. Compared with OpenAPI v1, OpenAPI v2 provides more comprehensive support for replication tasks. The features provided by TiCDC OpenAPI are a subset of the
cdc cli
tool. You can query and operate TiCDC clusters via OpenAPI v2, such as getting TiCDC node status, checking cluster health status, and managing replication tasks.For more information, see documentation.
DBeaver v23.0.1 supports TiDB by default #17396 @Icemap
- Provides an independent TiDB module, icon, and logo.
- The default configuration supports TiDB Cloud Serverless, making it easier to connect to TiDB Cloud Serverless.
- Supports identifying TiDB versions to display or hide foreign key tabs.
- Supports visualizing SQL execution plans in
EXPLAIN
results. - Supports highlighting TiDB keywords such as
PESSIMISTIC
,OPTIMISTIC
,AUTO_RANDOM
,PLACEMENT
,POLICY
,REORGANIZE
,EXCHANGE
,CACHE
,NONCLUSTERED
, andCLUSTERED
. - Supports highlighting TiDB functions such as
TIDB_BOUNDED_STALENESS
,TIDB_DECODE_KEY
,TIDB_DECODE_PLAN
,TIDB_IS_DDL_OWNER
,TIDB_PARSE_TSO
,TIDB_VERSION
,TIDB_DECODE_SQL_DIGESTS
, andTIDB_SHARD
.
For more information, see DBeaver documentation.
Data migration
Enhance the functionalities of
LOAD DATA
statements and support importing data from cloud storage (experimental) #40499 @lance6716Before TiDB v7.0.0, the
LOAD DATA
statement could only import data files from the client side. If you wanted to import data from cloud storage, you had to rely on TiDB Lightning. However, deploying TiDB Lightning separately would bring additional deployment and management costs. In v7.0.0, you can directly import data from cloud storage using theLOAD DATA
statement. Some examples of the feature are as follows:- Supports importing data from Amazon S3 and Google Cloud Storage to TiDB. Supports importing multiple source files to TiDB in one go with wildcards.
- Support using
DEFINED NULL BY
to define null. - Support source files in CSV and TSV formats.
For more information, see documentation.
TiDB Lightning supports enabling compressed transfers when sending key-value pairs to TiKV (GA) #41163 @sleepymole
Starting from v6.6.0, TiDB Lightning supports compressing locally encoded and sorted key-value pairs for network transfer when sending them to TiKV, thus reducing the amount of data transferred over the network and lowering the network bandwidth overhead. In the earlier TiDB versions before this feature is supported, TiDB Lightning requires relatively high network bandwidth and incurs high traffic charges in case of large data volumes.
In v7.0.0, this feature becomes GA and is disabled by default. To enable it, you can set the
compress-kv-pairs
configuration item of TiDB Lightning to"gzip"
or"gz"
.For more information, see documentation.
Compatibility changes
MySQL compatibility
TiDB removes the constraint that the auto-increment column must be an index #40580 @tiancaiamao
Before v7.0.0, TiDB's behavior is consistent with MySQL, requiring the auto-increment column to be an index or index prefix. Starting from v7.0.0, TiDB removes the constraint that the auto-increment column must be an index or index prefix. Now you can define the primary key of a table more flexibly and use the auto-increment column to implement sorting and pagination more conveniently. This also avoids the write hotspot problem caused by the auto-increment column and improves query performance by using the table with clustered indexes. With the new release, you can create a table using the following syntax:
CREATE TABLE test1 ( `id` int(11) NOT NULL AUTO_INCREMENT, `k` int(11) NOT NULL DEFAULT '0', `c` char(120) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '', PRIMARY KEY(`k`, `id`) );This feature does not affect TiCDC data replication.
For more information, see documentation.
TiDB supports Key partitions, as shown in the following example:
CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE DEFAULT '9999-12-31', job_code INT, store_id INT) PARTITION BY KEY(store_id) PARTITIONS 4;Starting from v7.0.0, TiDB supports Key partitions and can parse the MySQL
PARTITION BY LINEAR KEY
syntax. However, TiDB ignores theLINEAR
keyword and uses a non-linear hash algorithm instead. Currently, theKEY
partition type does not support partition statements with an empty partition column list.For more information, see documentation.
Behavior changes
TiCDC fixes the issue of incorrect encoding of
FLOAT
data in Avro #8490 @3AceShowHandWhen upgrading the TiCDC cluster to v7.0.0, if a table replicated using Avro contains the
FLOAT
data type, you need to manually adjust the compatibility policy of Confluent Schema Registry toNone
before upgrading so that the changefeed can successfully update the schema. Otherwise, after upgrading, the changefeed will be unable to update the schema and enter an error state.Starting from v7.0.0,
tidb_dml_batch_size
system variable no longer takes effect on theLOAD DATA
statement.
System variables
Variable name | Change type | Description |
---|---|---|
tidb_pessimistic_txn_aggressive_locking | Deleted | This variable is renamed to tidb_pessimistic_txn_fair_locking . |
tidb_enable_non_prepared_plan_cache | Modified | Takes effect starting from v7.0.0 and controls whether to enable the Non-prepared plan cache feature. |
tidb_enable_null_aware_anti_join | Modified | Changes the default value from OFF to ON after further tests, meaning that TiDB applies Null-Aware Hash Join when Anti Join is generated by subqueries led by special set operators NOT IN and != ALL by default. |
tidb_enable_resource_control | Modified | Changes the default value from OFF to ON , meaning that the cluster isolates resources by resource group by default. Resource Control is enabled by default in v7.0.0, so that you can use this feature whenever you want. |
tidb_non_prepared_plan_cache_size | Modified | Takes effect starting from v7.0.0 and controls the maximum number of execution plans that can be cached by Non-prepared plan cache. |
tidb_rc_read_check_ts | Modified | Starting from v7.0.0, this variable is no longer effective for cursor fetch read in the prepared statement protocol. |
tidb_enable_inl_join_inner_multi_pattern | Newly added | This variable controls whether Index Join is supported when the inner table has Selection or Projection operators on it. |
tidb_enable_plan_cache_for_subquery | Newly added | This variable controls whether Prepared Plan Cache caches queries that contain subqueries. |
tidb_enable_plan_replayer_continuous_capture | Newly added | This variable controls whether to enable the PLAN REPLAYER CONTINUOUS CAPTURE feature. The default value OFF means to disable the feature. |
tidb_load_based_replica_read_threshold | Newly added | This variable sets the threshold for triggering load-based replica read. The feature controlled by this variable is not fully functional in TiDB v7.0.0. Do not change the default value. |
tidb_opt_advanced_join_hint | Newly added | This variable controls whether the join method hint influences the optimization of join reorder. The default value is ON , which means the new compatible control mode is used. The value OFF means the behavior before v7.0.0 is used. For forward compatibility, the value of this variable is set to OFF when the cluster is upgraded from an earlier version to v7.0.0 or later. |
tidb_opt_derive_topn | Newly added | This variable controls whether to enable the Derive TopN or Limit from Window Functions optimization rule. The default value is OFF , which means the optimization rule is not enabled. |
tidb_opt_enable_late_materialization | Newly added | This variable controls whether to enable the TiFlash Late Materialization feature. The default value is OFF , which means the feature is not enabled. |
tidb_opt_ordering_index_selectivity_threshold | Newly added | This variable controls how the optimizer selects indexes when the SQL statement contains ORDER BY and LIMIT clauses and has filtering conditions. |
tidb_pessimistic_txn_fair_locking | Newly added | Controls whether to enable the enhanced pessimistic lock-waking model to reduce the tail latency of transactions under single-row conflict scenarios. The default value is ON . When the cluster is upgraded from an earlier version to v7.0.0 or later, the value of this variable is set to OFF . |
tidb_ttl_running_tasks | Newly added | This variable is used to limit the concurrency of TTL tasks across the entire cluster. The default value -1 means that the TTL tasks are the same as the number of TiKV nodes. |
Configuration file parameters
Configuration file | Configuration parameter | Change type | Description |
---|---|---|---|
TiKV | server.snap-max-write-bytes-per-sec | Deleted | This parameter is renamed to server.snap-io-max-bytes-per-sec . |
TiKV | raft-engine.enable-log-recycle | Modified | The default value changes from false to true . |
TiKV | resolved-ts.advance-ts-interval | Modified | The default value changes from "1s" to "20s" . This modification can increase the interval of the regular advancement of Resolved TS and reduce the traffic consumption between TiKV nodes. |
TiKV | resource-control.enabled | Modified | The default value changes from false to true . |
TiKV | raft-engine.prefill-for-recycle | Newly added | Controls whether to generate empty log files for log recycling in Raft Engine. The default value is false . |
PD | degraded-mode-wait-duration | Newly added | A Resource Control-related configuration item. It controls the waiting time for triggering the degraded mode. The default value is 0s . |
PD | read-base-cost | Newly added | A Resource Control-related configuration item. It controls the basis factor for conversion from a read request to RU. The default value is 0.25 . |
PD | read-cost-per-byte | Newly added | A Resource Control-related configuration item. It controls the basis factor for conversion from read flow to RU. The default value is 1/ (64 * 1024) . |
PD | read-cpu-ms-cost | Newly added | A Resource Control-related configuration item. It controls the basis factor for conversion from CPU to RU. The default value is 1/3 . |
PD | write-base-cost | Newly added | A Resource Control-related configuration item. It controls the basis factor for conversion from a write request to RU. The default value is 1 . |
PD | write-cost-per-byte | Newly added | A Resource Control-related configuration item. It controls the basis factor for conversion from write flow to RU. The default value is 1/1024 . |
TiFlash | mark_cache_size | Modified | Change the default cache limit of the metadata for a data block in TiFlash from 5368709120 to 1073741824 to reduce unnecessary memory usage. |
TiFlash | minmax_index_cache_size | Modified | Change the default cache limit of the min-max index for a data block in TiFlash from 5368709120 to 1073741824 to reduce unnecessary memory usage. |
TiFlash | flash.disaggregated_mode | Newly added | In the disaggregated architecture of TiFlash, it indicates whether this TiFlash node is a write node or a compute node. The value can be tiflash_write or tiflash_compute . |
TiFlash | storage.s3.endpoint | Newly added | The endpoint to connect to S3. |
TiFlash | storage.s3.bucket | Newly added | The bucket where TiFlash stores all data. |
TiFlash | storage.s3.root | Newly added | The root directory of data storage in S3 bucket. |
TiFlash | storage.s3.access_key_id | Newly added | ACCESS_KEY_ID for accessing S3. |
TiFlash | storage.s3.secret_access_key | Newly added | SECRET_ACCESS_KEY for accessing S3. |
TiFlash | storage.remote.cache.dir | Newly added | The local data cache directory of TiFlash compute node. |
TiFlash | storage.remote.cache.capacity | Newly added | The size of the local data cache directory of TiFlash compute node. |
TiDB Lightning | add-index-by-sql | Newly added | Controls whether to use SQL to add indexes in physical import mode. The default value is false , which means that TiDB Lightning will encode both row data and index data into KV pairs and import them into TiKV together. The advantage of adding indexes using SQL is to separate the import of data and the import of indexes, which can quickly import data. Even if the index creation fails after the data is imported, the data consistency is not affected. |
TiCDC | enable-table-across-nodes | Newly added | Determines whether to divide a table into multiple sync ranges according to the number of Regions. These ranges can be replicated by multiple TiCDC nodes. |
TiCDC | region-threshold | Newly added | When enable-table-across-nodes is enabled, this feature only takes effect on tables with more than region-threshold Regions. |
DM | analyze | Newly added | Controls whether to execute the ANALYZE TABLE <table> operation on each table after CHECKSUM is completed. It can be configured as "required" /"optional" /"off" . The default value is "optional" . |
DM | range-concurrency | Newly added | Controls the concurrency of dm-worker writing KV data to TiKV. |
DM | compress-kv-pairs | Newly added | Controls whether to enable compression when dm-worker sends KV data to TiKV. Currently, only gzip is supported. The default value is empty, which means no compression. |
DM | pd-addr | Newly added | Controls the address of the downstream PD server in the Physical Import mode. You can fill in either one or more PD servers. If this configuration item is empty, use the PD address information from the TiDB query by default. |
Improvements
TiDB
- Introduce the
EXPAND
operator to optimize the performance of SQL queries with multipleDISTINCT
in a singleSELECT
statement #16581 @AilinKid - Support more SQL formats for Index Join #40505 @Yisaer
- Avoid globally sorting partitioned table data in TiDB in some cases #26166 @Defined2014
- Support using
fair lock mode
andlock only if exists
at the same time #42068 @MyonKeminta - Support printing transaction slow logs and transaction internal events #41863 @ekexium
- Support the
ILIKE
operator #40943 @xzhangxian1008
- Introduce the
PD
TiFlash
- Reduce TiFlash's memory usage on write path #7144 @hongyunyan
- Reduce TiFlash's restart time in scenarios with many tables #7146 @hongyunyan
- Support pushing down the
ILIKE
operator #6740 @xzhangxian1008
Tools
TiCDC
Support distributing data changes of a single large table to multiple TiCDC nodes in scenarios where Kafka is the downstream, thus solving the scalability issue of single tables in data integration scenarios of large-scale TiDB clusters #8247 @overvenus
You can enable this feature by setting the TiCDC configuration item
enable_table_across_nodes
totrue
. You can useregion_threshold
to specify that when the number of Regions for a table exceeds this threshold, TiCDC starts distributing data changes of the corresponding table to multiple TiCDC nodes.Support splitting transactions in the redo applier to improve its throughput and reduce RTO in disaster recovery scenarios #8318 @CharlesCheung96
Improve the table scheduling to split a single table more evenly across various TiCDC nodes #8247 @overvenus
Add the Large Row monitoring metrics in MQ sink #8286 @hi-rustin
Reduce network traffic between TiKV and TiCDC nodes in scenarios where a Region contains data of multiple tables #6346 @overvenus
Move the P99 metrics panel of Checkpoint TS and Resolved TS to the Lag analyze panel #8524 @hi-rustin
Support applying DDL events in redo logs #8361 @CharlesCheung96
Support splitting and scheduling tables to TiCDC nodes based on upstream write throughput #7720 @overvenus
TiDB Lightning
TiDB Lightning Physical Import Mode supports separating data import and index import to improve import speed and stability #42132 @sleepymole
Add the
add-index-by-sql
parameter. The default value isfalse
, which means that TiDB Lightning encodes both row data and index data into KV pairs and import them into TiKV together. If you set it totrue
, it means that TiDB Lightning adds indexes via theADD INDEX
SQL statement after importing the row data to improve import speed and stability.Add the
tikv-importer.keyspace-name
parameter. The default value is an empty string, which means TiDB Lightning automatically gets the key space name of the corresponding tenant to import data. If you specify a value, the specified key space name will be used to import data. This parameter provides flexibility in the configuration of TiDB Lightning when you import data to a multi-tenant TiDB cluster. #41915 @lichunzhu
Bug fixes
TiDB
- Fix the issue of missing updates when upgrading TiDB from v6.5.1 to a later version #41502 @chrysan
- Fix the issue that the default values of some system variables are not modified after upgrading #41423 @crazycs520
- Fix the issue that Coprocessor request types related to adding indexes are displayed as unknown #41400 @tangenta
- Fix the issue of returning "PessimisticLockNotFound" when adding an index #41515 @tangenta
- Fix the issue of mistakenly returning
found duplicate key
when adding a unique index #41630 @tangenta - Fix the panic issue when adding an index #41880 @tangenta
- Fix the issue that TiFlash reports an error for generated columns during execution #40663 @guo-shaoge
- Fix the issue that TiDB might not be able to obtain statistics correctly when there is a time type #41938 @xuyifangreeneyes
- Fix the issue that full index scans might cause errors when prepared plan cache is enabled #42150 @fzzf678
- Fix the issue that
IFNULL(NOT NULL COLUMN, ...)
might return incorrect results #41734 @LittleFall - Fix the issue that TiDB might produce incorrect results when all data in a partitioned table is in a single Region #41801 @Defined2014
- Fix the issue that TiDB might produce incorrect results when different partitioned tables appear in a single SQL statement #42135 @mjonss
- Fix the issue that statistics auto-collection might not trigger correctly on a partitioned table after adding a new index to the partitioned table #41638 @xuyifangreeneyes
- Fix the issue that TiDB might read incorrect column statistics information after collecting statistics twice in a row #42073 @xuyifangreeneyes
- Fix the issue that IndexMerge might produce incorrect results when prepare plan cache is enabled #41828 @qw4990
- Fix the issue that IndexMerge might have goroutine leakage #41605 @guo-shaoge
- Fix the issue that non-BIGINT unsigned integers might produce incorrect results when compared with strings/decimals #41736 @LittleFall
- Fix the issue that killing a previous
ANALYZE
statement due to memory over-limit might cause the currentANALYZE
statement in the same session to be killed #41825 @XuHuaiyu - Fix the issue that data race might occur during the information collection process of the batch coprocessor #41412 @you06
- Fix the issue that an assertion error prevents printing MVCC information for partitioned tables #40629 @ekexium
- Fix the issue that fair lock mode adds locking to non-existent keys #41527 @ekexium
- Fix the issue that
INSERT IGNORE
andREPLACE
statements do not lock keys that do not modify values #42121 @zyguan
PD
- Fix the issue that the Region Scatter operation might cause uneven distribution of leaders #6017 @HunDunDM
- Fix the issue that data race might occur when getting PD members during startup #6069 @rleungx
- Fix the issue that data race might occur when collecting hotspot statistics #6069 @lhy1024
- Fix the issue that switching placement rule might cause uneven distribution of leaders #6195 @bufferflies
TiFlash
- Fix the issue that Decimal division does not round up the last digit in certain cases #7022 @LittleFall
- Fix the issue that Decimal cast rounds up incorrectly in certain cases #6994 @windtalker
- Fix the issue that TopN/Sort operators produce incorrect results after enabling the new collation #6807 @xzhangxian1008
- Fix the issue that TiFlash reports an error when aggregating a result set larger than 12 million rows on a single TiFlash node #6993 @windtalker
Tools
Backup & Restore (BR)
- Fix the issue of insufficient wait time for splitting Region retry during the PITR recovery process #42001 @joccau
- Fix the issue of recovery failures due to
memory is limited
error encountered during the PITR recovery process #41983 @joccau - Fix the issue that PITR log backup progress does not advance when a PD node is down #14184 @YuJuncen
- Alleviate the issue that the latency of the PITR log backup progress increases when Region leadership migration occurs #13638 @YuJuncen
TiCDC
- Fix the issue that restarting the changefeed might cause data loss or that the checkpoint cannot advance #8242 @overvenus
- Fix the data race issue in DDL sink #8238 @3AceShowHand
- Fix the issue that the changefeed in the
stopped
status might restart automatically #8330 @sdojjy - Fix the issue that the TiCDC server panics when all downstream Kafka servers are unavailable #8523 @3AceShowHand
- Fix the issue that data might be lost when the downstream is MySQL and the executed statement is incompatible with TiDB #8453 @asddongmen
- Fix the issue that rolling upgrade might cause TiCDC OOM or that the checkpoint gets stuck #8329 @overvenus
- Fix the issue that graceful upgrade for TiCDC clusters fails on Kubernetes #8484 @overvenus
TiDB Data Migration (DM)
- Fix the issue that when a DM worker node uses Google Cloud Storage, due to too frequent breakpoints, the request frequency limit of Google Cloud Storage is reached and the DM worker cannot write the data into Google Cloud Storage, thus causing the full data to fail to load #8482 @maxshuang
- Fix the issue that when multiple DM tasks replicate the same downstream data at the same time and all use the downstream metadata table to record the breakpoint information, the breakpoint information of all tasks is written to the same metadata table and uses the same task ID #8500 @maxshuang
TiDB Lightning
- Fix the issue that when Physical Import Mode is used for importing data, if there is an
auto_random
column in the composite primary key of the target table, but the value of the column is not specified in the source data, TiDB Lightning does not generate data for theauto_random
column automatically #41454 @D3Hunter - Fix the issue that when Logical Import Mode is used for importing data, the import fails due to lack of the
CONFIG
permission for the target cluster #41915 @lichunzhu
- Fix the issue that when Physical Import Mode is used for importing data, if there is an
Contributors
We would like to thank the following contributors from the TiDB community: