Configure a TiDB Cluster on Kubernetes

This document introduces how to configure a TiDB cluster for production deployment. It covers the following content:

Configure resources

Before deploying a TiDB cluster, it is necessary to configure the resources for each component of the cluster depending on your needs. PD, TiKV, and TiDB are the core service components of a TiDB cluster. In a production environment, you need to configure resources of these components according to their needs. For details, refer to Hardware Recommendations.

To ensure the proper scheduling and stable operation of the components of the TiDB cluster on Kubernetes, it is recommended to set Guaranteed-level quality of service (QoS) by making limits equal to requests when configuring resources. For details, refer to Configure Quality of Service for Pods.

If you are using a NUMA-based CPU, you need to enable Static's CPU management policy on the node for better performance. In order to allow the TiDB cluster component to monopolize the corresponding CPU resources, the CPU quota must be an integer greater than or equal to 1, apart from setting Guaranteed-level QoS as mentioned above. For details, refer to Control CPU Management Policies on the Node.

Configure TiDB deployment

To configure a TiDB deployment, you need to configure the TiDBCluster CR. Refer to the TidbCluster example for an example. For the complete configurations of TiDBCluster CR, refer to API documentation.

Cluster name

The cluster name can be configured by changing metadata.name in the TiDBCuster CR.

Version

Usually, components in a cluster are in the same version. It is recommended to configure spec.<pd/tidb/tikv/pump/tiflash/ticdc>.baseImage and spec.version, if you need to configure different versions for different components, you can configure spec.<pd/tidb/tikv/pump/tiflash/ticdc>.version.

Here are the formats of the parameters:

  • spec.version: the format is imageTag, such as v7.5.0

  • spec.<pd/tidb/tikv/pump/tiflash/ticdc>.baseImage: the format is imageName, such as pingcap/tidb

  • spec.<pd/tidb/tikv/pump/tiflash/ticdc>.version: the format is imageTag, such as v7.5.0

configUpdateStrategy

The default value of the spec.configUpdateStrategy field is InPlace, which means that when you modify config of a component, you need to manually trigger a rolling update to apply the new configurations to the cluster.

It is recommended that you configure spec.configUpdateStrategy: RollingUpdate to enable automatic update of configurations. In this way, every time the config of a component is updated, TiDB Operator automatically triggers a rolling update for the component and applies the modified configuration to the cluster.

enableDynamicConfiguration

It is recommended that you configure spec.enableDynamicConfiguration: true to enable the --advertise-status-addr startup parameter for TiKV.

Versions required:

  • TiDB 4.0.1 or later versions

pvReclaimPolicy

It is recommended that you configure spec.pvReclaimPolicy: Retain to ensure that the PV is retained even if the PVC is deleted. This is to ensure your data safety.

mountClusterClientSecret

PD and TiKV supports configuring mountClusterClientSecret. If TLS is enabled between cluster components, it is recommended to configure spec.pd.mountClusterClientSecret: true and spec.tikv.mountClusterClientSecret: true. Under such configuration, TiDB Operator automatically mounts the ${cluster_name}-cluster-client-secret certificate to the PD and TiKV container, so you can conveniently use pd-ctl and tikv-ctl.

startScriptVersion

To choose the different versions of the startup scripts for each component, you can configure the spec.startScriptVersion field in the cluster spec.

The supported versions of the start script are as follows:

  • v1 (default): the original version of the startup script.

  • v2: to optimize the start script for each component and make sure that upgrading TiDB Operator does not result in cluster rolling restart, TiDB Operator v1.4.0 introduces v2. Compared to v1, v2 has the following optimizations:

    • Use dig instead of nslookup to resolve DNS.
    • All components support debug mode.

It is recommended that you configure spec.startScriptVersion as the latest version (v2) for the new cluster.

Storage

Storage Class

You can set the storage class by modifying storageClassName of each component in ${cluster_name}/tidb-cluster.yaml and ${cluster_name}/tidb-monitor.yaml. For the storage classes supported by the Kubernetes cluster, check with your system administrator.

Different components of a TiDB cluster have different disk requirements. Before deploying a TiDB cluster, refer to the Storage Configuration document to select an appropriate storage class for each component according to the storage classes supported by the current Kubernetes cluster and usage scenario.

Multiple disks mounting

TiDB Operator supports mounting multiple PVs for PD, TiDB, TiKV, and TiCDC, which can be used for data writing for different purposes.

You can configure the storageVolumes field for each component to describe multiple user-customized PVs.

The meanings of the related fields are as follows:

  • storageVolume.name: The name of the PV.
  • storageVolume.storageClassName: The StorageClass that the PV uses. If not configured, spec.pd/tidb/tikv/ticdc.storageClassName will be used.
  • storageVolume.storageSize: The storage size of the requested PV.
  • storageVolume.mountPath: The path of the container to mount the PV to.

For example:

  • TiKV
  • TiDB
  • PD
  • TiCDC

To mount multiple PVs for TiKV:

tikv: ... config: | [rocksdb] wal-dir = "/data_sbi/tikv/wal" [titan] dirname = "/data_sbj/titan/data" storageVolumes: - name: wal storageSize: "2Gi" mountPath: "/data_sbi/tikv/wal" - name: titan storageSize: "2Gi" mountPath: "/data_sbj/titan/data"

To mount multiple PVs for TiDB:

tidb: config: | path = "/tidb/data" [log.file] filename = "/tidb/log/tidb.log" storageVolumes: - name: data storageSize: "2Gi" mountPath: "/tidb/data" - name: log storageSize: "2Gi" mountPath: "/tidb/log"

To mount multiple PVs for PD:

pd: config: | data-dir = "/pd/data" [log.file] filename = "/pd/log/pd.log" storageVolumes: - name: data storageSize: "10Gi" mountPath: "/pd/data" - name: log storageSize: "10Gi" mountPath: "/pd/log"

To mount multiple PVs for TiCDC:

ticdc: ... config: dataDir: /ticdc/data logFile: /ticdc/log/cdc.log storageVolumes: - name: data storageSize: "10Gi" storageClassName: local-storage mountPath: "/ticdc/data" - name: log storageSize: "10Gi" storageClassName: local-storage mountPath: "/ticdc/log"

HostNetwork

For PD, TiKV, TiDB, TiFlash, TiCDC, and Pump, you can configure the Pods to use the host namespace HostNetwork.

To enable HostNetwork for all supported components, configure spec.hostNetwork: true.

To enable HostNetwork for specified components, configure hostNetwork: true for the components.

Discovery

TiDB Operator starts a Discovery service for each TiDB cluster. The Discovery service can return the corresponding startup parameters for each PD Pod to support the startup of the PD cluster. You can configure resources of the Discovery service using spec.discovery. For details, see Managing Resources for Containers.

A spec.discovery configuration example is as follows:

spec: discovery: limits: cpu: "0.2" requests: cpu: "0.2" ...

Cluster topology

PD/TiKV/TiDB

The deployed cluster topology by default has three PD Pods, three TiKV Pods, and two TiDB Pods. In this deployment topology, the scheduler extender of TiDB Operator requires at least three nodes in the Kubernetes cluster to provide high availability. You can modify the replicas configuration to change the number of pods for each component.

Enable TiFlash

If you want to enable TiFlash in the cluster, configure spec.pd.config.replication.enable-placement-rules: true and configure spec.tiflash in the ${cluster_name}/tidb-cluster.yaml file as follows:

pd: config: | ... [replication] enable-placement-rules = true tiflash: baseImage: pingcap/tiflash maxFailoverCount: 0 replicas: 1 storageClaims: - resources: requests: storage: 100Gi storageClassName: local-storage

TiFlash supports mounting multiple Persistent Volumes (PVs). If you want to configure multiple PVs for TiFlash, configure multiple resources in tiflash.storageClaims, each resources with a separate storage request and storageClassName. For example:

tiflash: baseImage: pingcap/tiflash maxFailoverCount: 0 replicas: 1 storageClaims: - resources: requests: storage: 100Gi storageClassName: local-storage - resources: requests: storage: 100Gi storageClassName: local-storage

TiFlash mounts all PVs to directories such as /data0 and /data1 in the container in the order of configuration. TiFlash has four log files. The proxy log is printed in the standard output of the container. The other three logs are stored in the disk under the /data0 directory by default, which are /data0/logs/flash_cluster_manager.log, / data0/logs/error.log, /data0/logs/server.log. To modify the log storage path, refer to Configure TiFlash parameters.

Enable TiCDC

If you want to enable TiCDC in the cluster, you can add TiCDC spec to the TiDBCluster CR. For example:

spec: ticdc: baseImage: pingcap/ticdc replicas: 3

Configure TiDB components

This section introduces how to configure the parameters of TiDB/TiKV/PD/TiFlash/TiCDC.

Configure TiDB parameters

TiDB parameters can be configured by spec.tidb.config in TidbCluster Custom Resource.

For example:

spec: tidb: config: | split-table = true oom-action = "log"

For all the configurable parameters of TiDB, refer to TiDB Configuration File.

Configure TiKV parameters

TiKV parameters can be configured by spec.tikv.config in TidbCluster Custom Resource.

For example:

spec: tikv: config: | [storage] [storage.block-cache] capacity = "16GB"

For all the configurable parameters of TiKV, refer to TiKV Configuration File.

Configure PD parameters

PD parameters can be configured by spec.pd.config in TidbCluster Custom Resource.

For example:

spec: pd: config: | lease = 3 enable-prevote = true

For all the configurable parameters of PD, refer to PD Configuration File.

Configure TiFlash parameters

TiFlash parameters can be configured by spec.tiflash.config in TidbCluster Custom Resource.

For example:

spec: tiflash: config: config: | [flash] [flash.flash_cluster] log = "/data0/logs/flash_cluster_manager.log" [logger] count = 10 level = "information" errorlog = "/data0/logs/error.log" log = "/data0/logs/server.log"

For all the configurable parameters of TiFlash, refer to TiFlash Configuration File.

Configure TiCDC start parameters

You can configure TiCDC start parameters through spec.ticdc.config in TidbCluster Custom Resource.

For example:

For TiDB Operator v1.2.0-rc.2 and later versions, configure the parameters in the TOML format as follows:

spec: ticdc: config: | gc-ttl = 86400 log-level = "info"

For TiDB Operator versions earlier than v1.2.0-rc.2, configure the parameters in the YAML format as follows:

spec: ticdc: config: timezone: UTC gcTTL: 86400 logLevel: info

For all configurable start parameters of TiCDC, see TiCDC configuration.

Configure automatic failover thresholds of PD, TiDB, TiKV, and TiFlash

The automatic failover feature is enabled by default in TiDB Operator. When the Pods of PD, TiDB, TiKV, TiFlash fail or the corresponding nodes fail, TiDB Operator performs failover automatically and replenish the number of Pod replicas by scaling the corresponding components.

To avoid that the automatic failover feature creates too many Pods, you can configure the threshold of the maximum number of Pods that TiDB Operator can create during failover for each component. The default threshold is 3. If the threshold for a component is configured to 0, it means that the automatic failover feature is disabled for this component. An example configuration is as follows:

pd: maxFailoverCount: 3 tidb: maxFailoverCount: 3 tikv: maxFailoverCount: 3 tiflash: maxFailoverCount: 3

Configure graceful upgrade for TiDB cluster

When you perform a rolling update to the TiDB cluster, Kubernetes sends a TERM signal to the TiDB server before it stops the TiDB Pod. When the TiDB server receives the TERM signal, it tries to wait for all connections to close. After 15 seconds, the TiDB server forcibly closes all the connections and exits the process.

You can enable this feature by configuring the following items:

  • spec.tidb.terminationGracePeriodSeconds: The longest tolerable duration to delete the old TiDB Pod during the rolling upgrade. If this duration is exceeded, the TiDB Pod will be deleted forcibly.
  • spec.tidb.lifecycle: Sets the preStop hook for the TiDB Pod, which is the operation executed before the TiDB server stops.
spec: tidb: terminationGracePeriodSeconds: 60 lifecycle: preStop: exec: command: - /bin/sh - -c - "sleep 10 && kill -QUIT 1"

The YAML file above:

  • Sets the longest tolerable duration to delete the TiDB Pod to 60 seconds. If the client does not close the connections after 60 seconds, these connections will be closed forcibly. You can adjust the value according to your needs.
  • Sets the value of preStop hook to sleep 10 && kill -QUIT 1. Here PID 1 refers to the PID of the TiDB server process in the TiDB Pod. When the TiDB server process receives the signal, it exits only after all the connections are closed by the client.

When Kubernetes deletes the TiDB Pod, it also removes the TiDB node from the service endpoints. This is to ensure that the new connection is not established to this TiDB node. However, because this process is asynchronous, you can make the system sleep for a few seconds before you send the kill signal, which makes sure that the TiDB node is removed from the endpoints.

Configure graceful upgrade for TiKV cluster

During TiKV upgrade, TiDB Operator evicts all Region leaders from TiKV Pod before restarting TiKV Pod. Only after the eviction is completed (which means the number of Region leaders on TiKV Pod drops to 0) or the eviction exceeds the specified timeout (1500 minutes by default), TiKV Pod is restarted. If TiKV has fewer than 2 replicas, TiDB Operator forces an upgrade without waiting for the timeout.

If the eviction of Region leaders exceeds the specified timeout, restarting TiKV Pod causes issues such as failures of some requests or more latency. To avoid the issues, you can configure the timeout spec.tikv.evictLeaderTimeout (1500 minutes by default) to a larger value. For example:

spec: tikv: evictLeaderTimeout: 10000m

Configure graceful upgrade for TiCDC cluster

During TiCDC upgrade, TiDB Operator drains all replication workloads from TiCDC Pod before restarting TiCDC Pod. Only after the draining is completed or the draining exceeds the specified timeout (10 minutes by default), TiCDC Pod is restarted. If TiCDC has fewer than 2 instances, TiDB Operator forces an upgrade without waiting for the timeout.

If the draining exceeds the specified timeout, restarting TiCDC Pod causes issues such as more replication latency. To avoid the issues, you can configure the timeout spec.ticdc.gracefulShutdownTimeout (10 minutes by default) to a larger value. For example:

spec: ticdc: gracefulShutdownTimeout: 100m

Configure PV for TiDB slow logs

By default, TiDB Operator creates a slowlog volume (which is an EmptyDir) to store the slow logs, mounts the slowlog volume to /var/log/tidb, and prints slow logs in the stdout through a sidecar container.

If you want to use a separate PV to store the slow logs, you can specify the name of the PV in spec.tidb.slowLogVolumeName, and then configure the PV in spec.tidb.storageVolumes or spec.tidb.additionalVolumes.

This section shows how to configure PV using spec.tidb.storageVolumes or spec.tidb.additionalVolumes.

Configure using spec.tidb.storageVolumes

Configure the TidbCluster CR as the following example. In the example, TiDB Operator uses the ${volumeName} PV to store slow logs. The log file path is ${mountPath}/${volumeName}.

For how to configure the spec.tidb.storageVolumes field, refer to Multiple disks mounting.

tidb: ... separateSlowLog: true # can be ignored slowLogVolumeName: ${volumeName} storageVolumes: # name must be consistent with slowLogVolumeName - name: ${volumeName} storageClassName: ${storageClass} storageSize: "1Gi" mountPath: ${mountPath}

Configure using spec.tidb.additionalVolumes

In the following example, NFS is used as the storage, and TiDB Operator uses the ${volumeName} PV to store slow logs. The log file path is ${mountPath}/${volumeName}.

For the supported PV types, refer to Persistent Volumes.

tidb: ... separateSlowLog: true # can be ignored slowLogVolumeName: ${volumeName} additionalVolumes: # name must be consistent with slowLogVolumeName - name: ${volumeName} nfs: server: 192.168.0.2 path: /nfs additionalVolumeMounts: # name must be consistent with slowLogVolumeName - name: ${volumeName} mountPath: ${mountPath}

Configure TiDB service

You need to configure spec.tidb.service so that TiDB Operator creates a service for TiDB. You can configure Service with different types according to the scenarios, such as ClusterIP, NodePort, LoadBalancer, and so on.

General configurations

Different types of services share some general configurations as follows:

  • spec.tidb.service.annotations: the annotation added to the Service resource.
  • spec.tidb.service.labels: the labels added to the Service resource.

ClusterIP

ClusterIP exposes services through the internal IP of the cluster. When selecting this type of service, you can only access it within the cluster using ClusterIP or the Service domain name (${cluster_name}-tidb.${namespace}).

spec: ... tidb: service: type: ClusterIP

NodePort

If there is no LoadBalancer, you can choose to expose the service through NodePort. NodePort exposes services through the node's IP and static port. You can access a NodePort service from outside of the cluster by requesting NodeIP + NodePort.

spec: ... tidb: service: type: NodePort # externalTrafficPolicy: Local

NodePort has two modes:

  • externalTrafficPolicy=Cluster: All machines in the cluster allocate a NodePort port to TiDB, which is the default value.

    When using the Cluster mode, you can access the TiDB service through the IP and NodePort of any machine. If there is no TiDB Pod on the machine, the corresponding request will be forwarded to the machine with TiDB Pod.

  • externalTrafficPolicy=Local: Only the machine that TiDB is running on allocates a NodePort port to access the local TiDB instance.

LoadBalancer

If the TiDB cluster runs in an environment with LoadBalancer, such as on GCP or AWS, it is recommended to use the LoadBalancer feature of these cloud platforms by setting tidb.service.type=LoadBalancer.

spec: ... tidb: service: annotations: cloud.google.com/load-balancer-type: "Internal" externalTrafficPolicy: Local type: LoadBalancer

See Kubernetes Service Documentation to know more about the features of Service and what LoadBalancer in the cloud platform supports.

IPv6 Support

Starting v6.5.1, TiDB supports using IPv6 addresses for all network connections. If you deploy TiDB using TiDB Operator v1.4.3 or later versions, you can enable the TiDB cluster to listen on IPv6 addresses by configuring spec.preferIPv6 to true.

spec: preferIPv6: true # ...

Configure high availability

TiDB is a distributed database and its high availability must ensure that when any physical topology node fails, not only the service is unaffected, but also the data is complete and available. The two configurations of high availability are described separately as follows.

High availability of TiDB service

Use nodeSelector to schedule Pods

By configuring the nodeSelector field of each component, you can specify the specific nodes that the component Pods are scheduled onto. For details on nodeSelector, refer to nodeSelector.

apiVersion: pingcap.com/v1alpha1 kind: TidbCluster # ... spec: pd: nodeSelector: node-role.kubernetes.io/pd: true # ... tikv: nodeSelector: node-role.kubernetes.io/tikv: true # ... tidb: nodeSelector: node-role.kubernetes.io/tidb: true # ...

Use tolerations to schedule Pods

By configuring the tolerations field of each component, you can allow the component Pods to schedule onto nodes with matching taints. For details on taints and tolerations, refer to Taints and Tolerations.

apiVersion: pingcap.com/v1alpha1 kind: TidbCluster # ... spec: pd: tolerations: - effect: NoSchedule key: dedicated operator: Equal value: pd # ... tikv: tolerations: - effect: NoSchedule key: dedicated operator: Equal value: tikv # ... tidb: tolerations: - effect: NoSchedule key: dedicated operator: Equal value: tidb # ...

Use affinity to schedule Pods

By configuring PodAntiAffinity, you can avoid the situation in which different instances of the same component are deployed on the same physical topology node. In this way, disaster recovery (high availability) is achieved. For the user guide of Affinity, see Affinity & AntiAffinity.

The following is an example of a typical service high availability setup:

affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: # this term works when the nodes have the label named region - weight: 10 podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/instance: ${cluster_name} app.kubernetes.io/component: "pd" topologyKey: "region" namespaces: - ${namespace} # this term works when the nodes have the label named zone - weight: 20 podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/instance: ${cluster_name} app.kubernetes.io/component: "pd" topologyKey: "zone" namespaces: - ${namespace} # this term works when the nodes have the label named rack - weight: 40 podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/instance: ${cluster_name} app.kubernetes.io/component: "pd" topologyKey: "rack" namespaces: - ${namespace} # this term works when the nodes have the label named kubernetes.io/hostname - weight: 80 podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/instance: ${cluster_name} app.kubernetes.io/component: "pd" topologyKey: "kubernetes.io/hostname" namespaces: - ${namespace}

Use topologySpreadConstraints to make pods evenly spread

By configuring topologySpreadConstraints, you can make pods evenly spread in different topologies. For instructions about configuring topologySpreadConstraints, see Pod Topology Spread Constraints.

To use topologySpreadConstraints, you must meet the following conditions:

  • Your Kubernetes cluster uses default-scheduler instead of tidb-scheduler. For details, refer to tidb-scheduler and default-scheduler.
  • Your Kubernetes cluster enables the EvenPodsSpread feature gate. If the Kubernetes version in use is earlier than v1.16 or if the EvenPodsSpread feature gate is disabled, the configuration of topologySpreadConstraints does not take effect.

You can either configure topologySpreadConstraints at a cluster level (spec.topologySpreadConstraints) for all components or at a component level (such as spec.tidb.topologySpreadConstraints) for specific components.

The following is an example configuration:

topologySpreadConstraints: - topologyKey: kubernetes.io/hostname - topologyKey: topology.kubernetes.io/zone

The example configuration can make pods of the same component evenly spread on different zones and nodes.

Currently, topologySpreadConstraints only supports the configuration of the topologyKey field. In the pod spec, the above example configuration will be automatically expanded as follows:

topologySpreadConstraints: - topologyKey: kubernetes.io/hostname maxSkew: 1 whenUnsatisfiable: DoNotSchedule labelSelector: <object> - topologyKey: topology.kubernetes.io/zone maxSkew: 1 whenUnsatisfiable: DoNotSchedule labelSelector: <object>

High availability of data

Before configuring the high availability of data, read Information Configuration of the Cluster Typology which describes how high availability of TiDB cluster is implemented.

To add the data high availability feature on Kubernetes:

  • Set the label collection of topological location for PD.

    Replace the location-labels information in the pd.config with the label collection that describes the topological location on the nodes in the Kubernetes cluster.

  • Set the topological information of the Node where the TiKV node is located.

    TiDB Operator automatically obtains the topological information of the Node for TiKV and calls the PD interface to set this information as the information of TiKV's store labels. Based on this topological information, the TiDB cluster schedules the replicas of the data.

    If the Node of the current Kubernetes cluster does not have a label indicating the topological location, or if the existing label name of topology contains /, you can manually add a label to the Node by running the following command:

    kubectl label node ${node_name} region=${region_name} zone=${zone_name} rack=${rack_name} kubernetes.io/hostname=${host_name}

    In the command above, region, zone, rack, and kubernetes.io/hostname are just examples. The name and number of the label to be added can be arbitrarily defined, as long as it conforms to the specification and is consistent with the labels set by location-labels in pd.config.

  • Set the topological information of the Node where the TiDB node is located.

    Since TiDB Operator v1.4.0, if the deployed TiDB version >= v6.3.0, TiDB Operator automatically obtains the topological information of the Node for TiDB and calls the corresponding interface of the TiDB server to set this information as TiDB's labels. Based on these labels, TiDB sends the Follower Read requests to the correct replicas.

    Currently, TiDB Operator automatically sets the labels for the TiDB server corresponding to the location-labels in pd.config. TiDB depends on the zone label to support some features of Follower Read. TiDB Operator obtains the value of zone, failure-domain.beta.kubernetes.io/zone, and topology.kubernetes.io/zone labels as zone. TiDB Operator only sets labels of the node where the TiDB server is located and ignores other labels.

Since v1.4.0, when setting labels for TiKV and TiDB nodes, TiDB Operator supports setting shortened aliases for some labels provided by Kubernetes by default. In some scenarios, using aliases can help optimize the scheduling performance of PD. When you use TiDB Operator to set aliases for the location-labels of PD, if there are no corresponding labels for a node, then TiDB Operator uses the original labels automatically.

Currently, TiDB Operator supports the following label aliases:

  • region: corresponds to topology.kubernetes.io/region and failure-domain.beta.kubernetes.io/region.
  • zone: corresponds to topology.kubernetes.io/zone and failure-domain.beta.kubernetes.io/zone.
  • host: corresponds to kubernetes.io/hostname.

For example, if labels such as region, zone, and host are not set on each node of Kubernetes, setting the location-labels of PD as ["topology.kubernetes.io/region", "topology.kubernetes.io/zone", "kubernetes.io/hostname"] is the same as ["region", "zone", "host"].

Was this page helpful?