Configure a TiDB Cluster on Kubernetes
This document introduces how to configure a TiDB cluster for production deployment. It covers the following content:
Configure resources
Before deploying a TiDB cluster, it is necessary to configure the resources for each component of the cluster depending on your needs. PD, TiKV, and TiDB are the core service components of a TiDB cluster. In a production environment, you need to configure resources of these components according to their needs. For details, refer to Hardware Recommendations.
To ensure the proper scheduling and stable operation of the components of the TiDB cluster on Kubernetes, it is recommended to set Guaranteed-level quality of service (QoS) by making limits
equal to requests
when configuring resources. For details, refer to Configure Quality of Service for Pods.
If you are using a NUMA-based CPU, you need to enable Static
's CPU management policy on the node for better performance. In order to allow the TiDB cluster component to monopolize the corresponding CPU resources, the CPU quota must be an integer greater than or equal to 1
, apart from setting Guaranteed-level QoS as mentioned above. For details, refer to Control CPU Management Policies on the Node.
Configure TiDB deployment
To configure a TiDB deployment, you need to configure the TiDBCluster
CR. Refer to the TidbCluster example for an example. For the complete configurations of TiDBCluster
CR, refer to API documentation.
Cluster name
The cluster name can be configured by changing metadata.name
in the TiDBCuster
CR.
Version
Usually, components in a cluster are in the same version. It is recommended to configure spec.<pd/tidb/tikv/pump/tiflash/ticdc>.baseImage
and spec.version
, if you need to configure different versions for different components, you can configure spec.<pd/tidb/tikv/pump/tiflash/ticdc>.version
.
Here are the formats of the parameters:
spec.version
: the format isimageTag
, such asv6.1.0
spec.<pd/tidb/tikv/pump/tiflash/ticdc>.baseImage
: the format isimageName
, such aspingcap/tidb
spec.<pd/tidb/tikv/pump/tiflash/ticdc>.version
: the format isimageTag
, such asv6.1.0
Recommended configuration
configUpdateStrategy
The default value of the spec.configUpdateStrategy
field is InPlace
, which means that when you modify config
of a component, you need to manually trigger a rolling update to apply the new configurations to the cluster.
It is recommended that you configure spec.configUpdateStrategy: RollingUpdate
to enable automatic update of configurations. In this way, every time the config
of a component is updated, TiDB Operator automatically triggers a rolling update for the component and applies the modified configuration to the cluster.
enableDynamicConfiguration
It is recommended that you configure spec.enableDynamicConfiguration: true
to enable the --advertise-status-addr
startup parameter for TiKV.
Versions required:
- TiDB 4.0.1 or later versions
pvReclaimPolicy
It is recommended that you configure spec.pvReclaimPolicy: Retain
to ensure that the PV is retained even if the PVC is deleted. This is to ensure your data safety.
mountClusterClientSecret
PD and TiKV supports configuring mountClusterClientSecret
. If TLS is enabled between cluster components, it is recommended to configure spec.pd.mountClusterClientSecret: true
and spec.tikv.mountClusterClientSecret: true
. Under such configuration, TiDB Operator automatically mounts the ${cluster_name}-cluster-client-secret
certificate to the PD and TiKV container, so you can conveniently use pd-ctl
and tikv-ctl
.
Storage
Storage Class
You can set the storage class by modifying storageClassName
of each component in ${cluster_name}/tidb-cluster.yaml
and ${cluster_name}/tidb-monitor.yaml
. For the storage classes supported by the Kubernetes cluster, check with your system administrator.
Different components of a TiDB cluster have different disk requirements. Before deploying a TiDB cluster, refer to the Storage Configuration document to select an appropriate storage class for each component according to the storage classes supported by the current Kubernetes cluster and usage scenario.
Multiple disks mounting
TiDB Operator supports mounting multiple PVs for PD, TiDB, TiKV, and TiCDC, which can be used for data writing for different purposes.
You can configure the storageVolumes
field for each component to describe multiple user-customized PVs.
The meanings of the related fields are as follows:
storageVolume.name
: The name of the PV.storageVolume.storageClassName
: The StorageClass that the PV uses. If not configured,spec.pd/tidb/tikv/ticdc.storageClassName
will be used.storageVolume.storageSize
: The storage size of the requested PV.storageVolume.mountPath
: The path of the container to mount the PV to.
For example:
- TiKV
- TiDB
- PD
- TiCDC
To mount multiple PVs for TiKV:
tikv:
...
config: |
[rocksdb]
wal-dir = "/data_sbi/tikv/wal"
[titan]
dirname = "/data_sbj/titan/data"
storageVolumes:
- name: wal
storageSize: "2Gi"
mountPath: "/data_sbi/tikv/wal"
- name: titan
storageSize: "2Gi"
mountPath: "/data_sbj/titan/data"
To mount multiple PVs for TiDB:
tidb:
config: |
path = "/tidb/data"
[log.file]
filename = "/tidb/log/tidb.log"
storageVolumes:
- name: data
storageSize: "2Gi"
mountPath: "/tidb/data"
- name: log
storageSize: "2Gi"
mountPath: "/tidb/log"
To mount multiple PVs for PD:
pd:
config: |
data-dir = "/pd/data"
[log.file]
filename = "/pd/log/pd.log"
storageVolumes:
- name: data
storageSize: "10Gi"
mountPath: "/pd/data"
- name: log
storageSize: "10Gi"
mountPath: "/pd/log"
To mount multiple PVs for TiCDC:
ticdc:
...
config:
dataDir: /ticdc/data
logFile: /ticdc/log/cdc.log
storageVolumes:
- name: data
storageSize: "10Gi"
storageClassName: local-storage
mountPath: "/ticdc/data"
- name: log
storageSize: "10Gi"
storageClassName: local-storage
mountPath: "/ticdc/log"
HostNetwork
For PD, TiKV, TiDB, TiFlash, TiCDC, and Pump, you can configure the Pods to use the host namespace HostNetwork
.
To enable HostNetwork
for all supported components, configure spec.hostNetwork: true
.
To enable HostNetwork
for specified components, configure hostNetwork: true
for the components.
Discovery
TiDB Operator starts a Discovery service for each TiDB cluster. The Discovery service can return the corresponding startup parameters for each PD Pod to support the startup of the PD cluster. You can configure resources of the Discovery service using spec.discovery
. For details, see Managing Resources for Containers.
A spec.discovery
configuration example is as follows:
spec:
discovery:
limits:
cpu: "0.2"
requests:
cpu: "0.2"
...
Cluster topology
PD/TiKV/TiDB
The deployed cluster topology by default has three PD Pods, three TiKV Pods, and two TiDB Pods. In this deployment topology, the scheduler extender of TiDB Operator requires at least three nodes in the Kubernetes cluster to provide high availability. You can modify the replicas
configuration to change the number of pods for each component.
Enable TiFlash
If you want to enable TiFlash in the cluster, configure spec.pd.config.replication.enable-placement-rules: true
and configure spec.tiflash
in the ${cluster_name}/tidb-cluster.yaml
file as follows:
pd:
config: |
...
[replication]
enable-placement-rules = true
tiflash:
baseImage: pingcap/tiflash
maxFailoverCount: 0
replicas: 1
storageClaims:
- resources:
requests:
storage: 100Gi
storageClassName: local-storage
TiFlash supports mounting multiple Persistent Volumes (PVs). If you want to configure multiple PVs for TiFlash, configure multiple resources
in tiflash.storageClaims
, each resources
with a separate storage request
and storageClassName
. For example:
tiflash:
baseImage: pingcap/tiflash
maxFailoverCount: 0
replicas: 1
storageClaims:
- resources:
requests:
storage: 100Gi
storageClassName: local-storage
- resources:
requests:
storage: 100Gi
storageClassName: local-storage
TiFlash mounts all PVs to directories such as /data0
and /data1
in the container in the order of configuration. TiFlash has four log files. The proxy log is printed in the standard output of the container. The other three logs are stored in the disk under the /data0
directory by default, which are /data0/logs/flash_cluster_manager.log
, / data0/logs/error.log
, /data0/logs/server.log
. To modify the log storage path, refer to Configure TiFlash parameters.
Enable TiCDC
If you want to enable TiCDC in the cluster, you can add TiCDC spec to the TiDBCluster
CR. For example:
spec:
ticdc:
baseImage: pingcap/ticdc
replicas: 3
Configure TiDB components
This section introduces how to configure the parameters of TiDB/TiKV/PD/TiFlash/TiCDC.
Configure TiDB parameters
TiDB parameters can be configured by spec.tidb.config
in TidbCluster Custom Resource.
For example:
spec:
tidb:
config: |
split-table = true
oom-action = "log"
For all the configurable parameters of TiDB, refer to TiDB Configuration File.
Configure TiKV parameters
TiKV parameters can be configured by spec.tikv.config
in TidbCluster Custom Resource.
For example:
spec:
tikv:
config: |
[storage]
[storage.block-cache]
capacity = "16GB"
For all the configurable parameters of TiKV, refer to TiKV Configuration File.
Configure PD parameters
PD parameters can be configured by spec.pd.config
in TidbCluster Custom Resource.
For example:
spec:
pd:
config: |
lease = 3
enable-prevote = true
For all the configurable parameters of PD, refer to PD Configuration File.
Configure TiFlash parameters
TiFlash parameters can be configured by spec.tiflash.config
in TidbCluster Custom Resource.
For example:
spec:
tiflash:
config:
config: |
[flash]
[flash.flash_cluster]
log = "/data0/logs/flash_cluster_manager.log"
[logger]
count = 10
level = "information"
errorlog = "/data0/logs/error.log"
log = "/data0/logs/server.log"
For all the configurable parameters of TiFlash, refer to TiFlash Configuration File.
Configure TiCDC start parameters
You can configure TiCDC start parameters through spec.ticdc.config
in TidbCluster Custom Resource.
For example:
For TiDB Operator v1.2.0-rc.2 and later versions, configure the parameters in the TOML format as follows:
spec:
ticdc:
config: |
gc-ttl = 86400
log-level = "info"
For TiDB Operator versions earlier than v1.2.0-rc.2, configure the parameters in the YAML format as follows:
spec:
ticdc:
config:
timezone: UTC
gcTTL: 86400
logLevel: info
For all configurable start parameters of TiCDC, see TiCDC configuration.
Configure automatic failover thresholds of PD, TiDB, TiKV, and TiFlash
The automatic failover feature is enabled by default in TiDB Operator. When the Pods of PD, TiDB, TiKV, TiFlash fail or the corresponding nodes fail, TiDB Operator performs failover automatically and replenish the number of Pod replicas by scaling the corresponding components.
To avoid that the automatic failover feature creates too many Pods, you can configure the threshold of the maximum number of Pods that TiDB Operator can create during failover for each component. The default threshold is 3
. If the threshold for a component is configured to 0
, it means that the automatic failover feature is disabled for this component. An example configuration is as follows:
pd:
maxFailoverCount: 3
tidb:
maxFailoverCount: 3
tikv:
maxFailoverCount: 3
tiflash:
maxFailoverCount: 3
Configure graceful upgrade for TiDB cluster
When you perform a rolling update to the TiDB cluster, Kubernetes sends a TERM
signal to the TiDB server before it stops the TiDB Pod. When the TiDB server receives the TERM
signal, it tries to wait for all connections to close. After 15 seconds, the TiDB server forcibly closes all the connections and exits the process.
You can enable this feature by configuring the following items:
spec.tidb.terminationGracePeriodSeconds
: The longest tolerable duration to delete the old TiDB Pod during the rolling upgrade. If this duration is exceeded, the TiDB Pod will be deleted forcibly.spec.tidb.lifecycle
: Sets thepreStop
hook for the TiDB Pod, which is the operation executed before the TiDB server stops.
spec:
tidb:
terminationGracePeriodSeconds: 60
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- "sleep 10 && kill -QUIT 1"
The YAML file above:
- Sets the longest tolerable duration to delete the TiDB Pod to 60 seconds. If the client does not close the connections after 60 seconds, these connections will be closed forcibly. You can adjust the value according to your needs.
- Sets the value of
preStop
hook tosleep 10 && kill -QUIT 1
. HerePID 1
refers to the PID of the TiDB server process in the TiDB Pod. When the TiDB server process receives the signal, it exits only after all the connections are closed by the client.
When Kubernetes deletes the TiDB Pod, it also removes the TiDB node from the service endpoints. This is to ensure that the new connection is not established to this TiDB node. However, because this process is asynchronous, you can make the system sleep for a few seconds before you send the kill
signal, which makes sure that the TiDB node is removed from the endpoints.
Configure graceful upgrade for TiKV cluster
During TiKV upgrade, TiDB Operator evicts all Region leaders from TiKV Pod before restarting TiKV Pod. Only after the eviction is completed (which means the number of Region leaders on TiKV Pod drops to 0) or the eviction exceeds the specified timeout (1500 minutes by default), TiKV Pod is restarted. If TiKV has fewer than 2 replicas, TiDB Operator forces an upgrade without waiting for the timeout.
If the eviction of Region leaders exceeds the specified timeout, restarting TiKV Pod causes issues such as failures of some requests or more latency. To avoid the issues, you can configure the timeout spec.tikv.evictLeaderTimeout
(1500 minutes by default) to a larger value. For example:
spec:
tikv:
evictLeaderTimeout: 10000m
Configure graceful upgrade for TiCDC cluster
During TiCDC upgrade, TiDB Operator drains all replication workloads from TiCDC Pod before restarting TiCDC Pod. Only after the draining is completed or the draining exceeds the specified timeout (10 minutes by default), TiCDC Pod is restarted. If TiCDC has fewer than 2 instances, TiDB Operator forces an upgrade without waiting for the timeout.
If the draining exceeds the specified timeout, restarting TiCDC Pod causes issues such as more replication latency. To avoid the issues, you can configure the timeout spec.ticdc.gracefulShutdownTimeout
(10 minutes by default) to a larger value. For example:
spec:
ticdc:
gracefulShutdownTimeout: 100m
Configure PV for TiDB slow logs
By default, TiDB Operator creates a slowlog
volume (which is an EmptyDir
) to store the slow logs, mounts the slowlog
volume to /var/log/tidb
, and prints slow logs in the stdout
through a sidecar container.
If you want to use a separate PV to store the slow logs, you can specify the name of the PV in spec.tidb.slowLogVolumeName
, and then configure the PV in spec.tidb.storageVolumes
or spec.tidb.additionalVolumes
.
This section shows how to configure PV using spec.tidb.storageVolumes
or spec.tidb.additionalVolumes
.
Configure using spec.tidb.storageVolumes
Configure the TidbCluster
CR as the following example. In the example, TiDB Operator uses the ${volumeName}
PV to store slow logs. The log file path is ${mountPath}/${volumeName}
.
For how to configure the spec.tidb.storageVolumes
field, refer to Multiple disks mounting.
tidb:
...
separateSlowLog: true # can be ignored
slowLogVolumeName: ${volumeName}
storageVolumes:
# name must be consistent with slowLogVolumeName
- name: ${volumeName}
storageClassName: ${storageClass}
storageSize: "1Gi"
mountPath: ${mountPath}
Configure using spec.tidb.additionalVolumes
In the following example, NFS is used as the storage, and TiDB Operator uses the ${volumeName}
PV to store slow logs. The log file path is ${mountPath}/${volumeName}
.
For the supported PV types, refer to Persistent Volumes.
tidb:
...
separateSlowLog: true # can be ignored
slowLogVolumeName: ${volumeName}
additionalVolumes:
# name must be consistent with slowLogVolumeName
- name: ${volumeName}
nfs:
server: 192.168.0.2
path: /nfs
additionalVolumeMounts:
# name must be consistent with slowLogVolumeName
- name: ${volumeName}
mountPath: ${mountPath}
Configure TiDB service
You need to configure spec.tidb.service
so that TiDB Operator creates a service for TiDB. You can configure Service with different types according to the scenarios, such as ClusterIP
, NodePort
, LoadBalancer
, etc.
General configurations
Different types of services share some general configurations as follows:
spec.tidb.service.annotations
: the annotation added to the Service resource.spec.tidb.service.labels
: the labels added to the Service resource.
ClusterIP
ClusterIP
exposes services through the internal IP of the cluster. When selecting this type of service, you can only access it within the cluster using ClusterIP or the Service domain name (${cluster_name}-tidb.${namespace}
).
spec:
...
tidb:
service:
type: ClusterIP
NodePort
If there is no LoadBalancer, you can choose to expose the service through NodePort. NodePort exposes services through the node's IP and static port. You can access a NodePort service from outside of the cluster by requesting NodeIP + NodePort
.
spec:
...
tidb:
service:
type: NodePort
# externalTrafficPolicy: Local
NodePort has two modes:
externalTrafficPolicy=Cluster
: All machines in the cluster allocate a NodePort port to TiDB, which is the default value.When using the
Cluster
mode, you can access the TiDB service through the IP and NodePort of any machine. If there is no TiDB Pod on the machine, the corresponding request will be forwarded to the machine with TiDB Pod.externalTrafficPolicy=Local
: Only the machine that TiDB is running on allocates a NodePort port to access the local TiDB instance.
LoadBalancer
If the TiDB cluster runs in an environment with LoadBalancer, such as on GCP or AWS, it is recommended to use the LoadBalancer feature of these cloud platforms by setting tidb.service.type=LoadBalancer
.
spec:
...
tidb:
service:
annotations:
cloud.google.com/load-balancer-type: "Internal"
externalTrafficPolicy: Local
type: LoadBalancer
See Kubernetes Service Documentation to know more about the features of Service and what LoadBalancer in the cloud platform supports.
Configure high availability
TiDB is a distributed database and its high availability must ensure that when any physical topology node fails, not only the service is unaffected, but also the data is complete and available. The two configurations of high availability are described separately as follows.
High availability of TiDB service
Use nodeSelector to schedule Pods
By configuring the nodeSelector
field of each component, you can specify the specific nodes that the component Pods are scheduled onto. For details on nodeSelector
, refer to nodeSelector.
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
# ...
spec:
pd:
nodeSelector:
node-role.kubernetes.io/pd: true
# ...
tikv:
nodeSelector:
node-role.kubernetes.io/tikv: true
# ...
tidb:
nodeSelector:
node-role.kubernetes.io/tidb: true
# ...
Use tolerations to schedule Pods
By configuring the tolerations
field of each component, you can allow the component Pods to schedule onto nodes with matching taints. For details on taints and tolerations, refer to Taints and Tolerations.
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
# ...
spec:
pd:
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: pd
# ...
tikv:
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: tikv
# ...
tidb:
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: tidb
# ...
Use affinity to schedule Pods
By configuring PodAntiAffinity
, you can avoid the situation in which different instances of the same component are deployed on the same physical topology node. In this way, disaster recovery (high availability) is achieved. For the user guide of Affinity, see Affinity & AntiAffinity.
The following is an example of a typical service high availability setup:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
# this term works when the nodes have the label named region
- weight: 10
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/instance: ${cluster_name}
app.kubernetes.io/component: "pd"
topologyKey: "region"
namespaces:
- ${namespace}
# this term works when the nodes have the label named zone
- weight: 20
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/instance: ${cluster_name}
app.kubernetes.io/component: "pd"
topologyKey: "zone"
namespaces:
- ${namespace}
# this term works when the nodes have the label named rack
- weight: 40
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/instance: ${cluster_name}
app.kubernetes.io/component: "pd"
topologyKey: "rack"
namespaces:
- ${namespace}
# this term works when the nodes have the label named kubernetes.io/hostname
- weight: 80
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/instance: ${cluster_name}
app.kubernetes.io/component: "pd"
topologyKey: "kubernetes.io/hostname"
namespaces:
- ${namespace}
Use topologySpreadConstraints to make pods evenly spread
By configuring topologySpreadConstraints
, you can make pods evenly spread in different topologies. For instructions about configuring topologySpreadConstraints
, see Pod Topology Spread Constraints.
To use topologySpreadConstraints
, you must meet the following conditions:
- Your Kubernetes cluster uses
default-scheduler
instead oftidb-scheduler
. For details, refer to tidb-scheduler and default-scheduler. - Your Kubernetes cluster enables the
EvenPodsSpread
feature gate. If the Kubernetes version in use is earlier than v1.16 or if theEvenPodsSpread
feature gate is disabled, the configuration oftopologySpreadConstraints
does not take effect.
You can either configure topologySpreadConstraints
at a cluster level (spec.topologySpreadConstraints
) for all components or at a component level (such as spec.tidb.topologySpreadConstraints
) for specific components.
The following is an example configuration:
topologySpreadConstraints:
- topologyKey: kubernetes.io/hostname
- topologyKey: topology.kubernetes.io/zone
The example configuration can make pods of the same component evenly spread on different zones and nodes.
Currently, topologySpreadConstraints
only supports the configuration of the topologyKey
field. In the pod spec, the above example configuration will be automatically expanded as follows:
topologySpreadConstraints:
- topologyKey: kubernetes.io/hostname
maxSkew: 1
whenUnsatisfiable: DoNotSchedule
labelSelector: <object>
- topologyKey: topology.kubernetes.io/zone
maxSkew: 1
whenUnsatisfiable: DoNotSchedule
labelSelector: <object>
High availability of data
Before configuring the high availability of data, read Information Configuration of the Cluster Typology which describes how high availability of TiDB cluster is implemented.
To add the data high availability feature on Kubernetes:
Set the label collection of topological location for PD
Replace the
location-labels
information in thepd.config
with the label collection that describes the topological location on the nodes in the Kubernetes cluster.Set the topological information of the Node where the TiKV node is located.
TiDB Operator automatically obtains the topological information of the Node for TiKV and calls the PD interface to set this information as the information of TiKV's store labels. Based on this topological information, the TiDB cluster schedules the replicas of the data.
If the Node of the current Kubernetes cluster does not have a label indicating the topological location, or if the existing label name of topology contains
/
, you can manually add a label to the Node by running the following command:kubectl label node ${node_name} region=${region_name} zone=${zone_name} rack=${rack_name} kubernetes.io/hostname=${host_name}In the command above,
region
,zone
,rack
, andkubernetes.io/hostname
are just examples. The name and number of the label to be added can be arbitrarily defined, as long as it conforms to the specification and is consistent with the labels set bylocation-labels
inpd.config
.