Common Network Issues of TiDB on Kubernetes
This document describes the common network issues of TiDB on Kubernetes and their solutions.
Network connection failure between Pods
In a TiDB cluster, you can access most Pods by using the Pod's domain name (allocated by the Headless Service). The exception is when TiDB Operator collects the cluster information or issues control commands, it accesses the PD (Placement Driver) cluster using the service-name
of the PD service.
When you find some network connection issues among Pods from the log or monitoring metrics, or when you find the network connection among Pods might be abnormal according to the problematic condition, follow the following process to diagnose and narrow down the problem:
Confirm that the endpoints of the Service and Headless Service are normal:
kubectl -n ${namespace} get endpoints ${cluster_name}-pd kubectl -n ${namespace} get endpoints ${cluster_name}-tidb kubectl -n ${namespace} get endpoints ${cluster_name}-pd-peer kubectl -n ${namespace} get endpoints ${cluster_name}-tikv-peer kubectl -n ${namespace} get endpoints ${cluster_name}-tidb-peerThe
ENDPOINTS
field shown in the above command must be a comma-separated list ofcluster_ip:port
. If the field is empty or incorrect, check the health of the Pod and whetherkube-controller-manager
is working properly.Enter the Pod's Network Namespace to diagnose network problems:
tkctl debug -n ${namespace} ${pod_name}After the remote shell is started, use the
dig
command to diagnose the DNS resolution. If the DNS resolution is abnormal, refer to Debugging DNS Resolution for troubleshooting.dig ${HOSTNAME}Use the
ping
command to diagnose the connection with the destination IP (the Pod IP resolved usingdig
):ping ${TARGET_IP}If the
ping
check fails, refer to Debugging Kubernetes Networking for troubleshooting.If the
ping
check succeeds, continue to check whether the target port is open by usingtelnet
:telnet ${TARGET_IP} ${TARGET_PORT}If the
telnet
check fails, check whether the port corresponding to the Pod is correctly exposed and whether the port of the application is correctly configured:# Checks whether the ports are consistent. kubectl -n ${namespace} get po ${pod_name} -ojson | jq '.spec.containers[].ports[].containerPort' # Checks whether the application is correctly configured to serve the specified port. # The default port of PD is 2379 when not configured. kubectl -n ${namespace} -it exec ${pod_name} -- cat /etc/pd/pd.toml | grep client-urls # The default port of PD is 20160 when not configured. kubectl -n ${namespace} -it exec ${pod_name} -- cat /etc/tikv/tikv.toml | grep addr # The default port of TiDB is 4000 when not configured. kubectl -n ${namespace} -it exec ${pod_name} -- cat /etc/tidb/tidb.toml | grep port
Unable to access the TiDB service
If you cannot access the TiDB service, first check whether the TiDB service is deployed successfully using the following method:
Check whether all components of the cluster are up and the status of each component is
Running
.kubectl get po -n ${namespace}Check whether the TiDB service correctly generates the endpoint object:
kubectl get endpoints -n ${namespaces} ${cluster_name}-tidbCheck the log of TiDB components to see whether errors are reported.
kubectl logs -f ${pod_name} -n ${namespace} -c tidb
If the cluster is successfully deployed, check the network using the following steps:
If you cannot access the TiDB service using
NodePort
, try to access the TiDB service using theclusterIP
on the node. If theclusterIP
works, the network within the Kubernetes cluster is normal. Then the possible issues are as follows:- Network failure exists between the client and the node.
- Check whether the
externalTrafficPolicy
attribute of the TiDB service isLocal
. If it isLocal
, you must access the client using the IP of the node where the TiDB Pod is located.
If you still cannot access the TiDB service using the
clusterIP
, connect using<PodIP>:4000
on the TiDB service backend. If thePodIP
works, you can confirm that the problem is in the connection betweenclusterIP
andPodIP
. Check the following items:Check whether
kube-proxy
on each node is working.kubectl get po -n kube-system -l k8s-app=kube-proxyCheck whether the TiDB service rule is correct in the
iptables
rules.iptables-save -t nat |grep ${clusterIP}Check whether the corresponding endpoint is correct:
kubectl get endpoints -n ${namespaces} ${cluster_name}-tidb
If you cannot access the TiDB service even using
PodIP
, the problem is on the Pod level network. Check the following items:- Check whether the relevant route rules on the node are correct.
- Check whether the network plugin service works well.
- Refer to network connection failure between Pods section.