Back Up TiDB Data to Amazon S3-Compatible Storage Using Dumpling

This document describes how to use Dumpling to back up data from a TiDB cluster deployed on AWS EKS to Amazon S3-compatible storage. Dumpling is a data export tool that exports data from TiDB or MySQL in SQL or CSV format for full data backup or export.

Prepare the Dumpling node pool

You can run Dumpling in an existing node pool or create a dedicated node pool. The following is a sample configuration for creating a new node pool. Replace the variables as needed:

${clusterName}: EKS cluster name

# eks_dumpling.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: ${clusterName}
  region: us-west-2
availabilityZones: ['us-west-2a', 'us-west-2b', 'us-west-2c']

nodeGroups:
  - name: dumpling
    instanceType: c5.xlarge
    desiredCapacity: 1
    privateNetworking: true
    availabilityZones: ["us-west-2a"]
    labels:
      dedicated: dumpling

Run the following command to create the node pool:

eksctl create nodegroup -f eks_dumpling.yaml

Deploy the Dumpling job

This section describes how to configure, deploy, and monitor the Dumpling job.

Configure the Dumpling job

The following is a sample configuration file (dumpling_job.yaml) for the Dumpling job. Replace the variables with your specific values as needed:

${name}: job name
${namespace}: Kubernetes namespace
${version}: Dumpling image version
For Dumpling parameters, refer to the Option list of Dumpling.

# dumpling_job.yaml
---
apiVersion: batch/v1
kind: Job
metadata:
  name: ${name}
  namespace: ${namespace}
  labels:
    app.kubernetes.io/component: dumpling
spec:
  template:
    spec:
      nodeSelector:
        dedicated: dumpling
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/component
                operator: In
                values:
                - dumpling
            topologyKey: kubernetes.io/hostname
      containers:
        - name: ${name}
          image: pingcap/dumpling:${version}
          command:
            - /bin/sh
            - -c
            - |
              /dumpling \
                  --host=basic-tidb \
                  --port=4000 \
                  --user=root \
                  --password='' \
                  --s3.region=${AWS_REGION} \
                  --threads=16 \
                  --rows=20000 \
                  --filesize=256MiB \
                  --database=test \
                  --filetype=csv \
                  --output=s3://bucket-path/
          env:
            - name: AWS_REGION
              value: ${AWS_REGION}
            - name: AWS_ACCESS_KEY_ID
              value: ${AWS_ACCESS_KEY_ID}
            - name: AWS_SECRET_ACCESS_KEY
              value: ${AWS_SECRET_ACCESS_KEY}
            - name: AWS_SESSION_TOKEN
              value: ${AWS_SESSION_TOKEN}
      restartPolicy: Never
  backoffLimit: 0

Create the Dumpling job

Run the following commands to create the Dumpling job:

export name=dumpling
export version=v8.5.1
export namespace=tidb-cluster
export AWS_REGION=us-west-2
export AWS_ACCESS_KEY_ID=<your-access-key-id>
export AWS_SECRET_ACCESS_KEY=<your-secret-access-key>
export AWS_SESSION_TOKEN=<your-session-token> # Optional

envsubst < dumpling_job.yaml | kubectl apply -f -

Check the Dumpling job status

Run the following command to check the Pod status of the Dumpling job:

kubectl -n ${namespace} get pod ${name}

View Dumpling job logs

Run the following command to view the logs of the Dumpling job:

kubectl -n ${namespace} logs pod ${name}