Back Up TiDB Data to Amazon S3-Compatible Storage Using Dumpling

This document describes how to use Dumpling to back up data from a TiDB cluster deployed on AWS EKS to Amazon S3-compatible storage. Dumpling is a data export tool that exports data from TiDB or MySQL in SQL or CSV format for full data backup or export.

Prepare the Dumpling node pool

You can run Dumpling in an existing node pool or create a dedicated node pool. The following is a sample configuration for creating a new node pool. Replace the variables as needed:

  • ${clusterName}: EKS cluster name
# eks_dumpling.yaml apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: ${clusterName} region: us-west-2 availabilityZones: ['us-west-2a', 'us-west-2b', 'us-west-2c'] nodeGroups: - name: dumpling instanceType: c5.xlarge desiredCapacity: 1 privateNetworking: true availabilityZones: ["us-west-2a"] labels: dedicated: dumpling

Run the following command to create the node pool:

eksctl create nodegroup -f eks_dumpling.yaml

Deploy the Dumpling job

This section describes how to configure, deploy, and monitor the Dumpling job.

Configure the Dumpling job

The following is a sample configuration file (dumpling_job.yaml) for the Dumpling job. Replace the variables with your specific values as needed:

  • ${name}: job name
  • ${namespace}: Kubernetes namespace
  • ${version}: Dumpling image version
  • For Dumpling parameters, refer to the Option list of Dumpling.
# dumpling_job.yaml --- apiVersion: batch/v1 kind: Job metadata: name: ${name} namespace: ${namespace} labels: app.kubernetes.io/component: dumpling spec: template: spec: nodeSelector: dedicated: dumpling affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app.kubernetes.io/component operator: In values: - dumpling topologyKey: kubernetes.io/hostname containers: - name: ${name} image: pingcap/dumpling:${version} command: - /bin/sh - -c - | /dumpling \ --host=basic-tidb \ --port=4000 \ --user=root \ --password='' \ --s3.region=${AWS_REGION} \ --threads=16 \ --rows=20000 \ --filesize=256MiB \ --database=test \ --filetype=csv \ --output=s3://bucket-path/ env: - name: AWS_REGION value: ${AWS_REGION} - name: AWS_ACCESS_KEY_ID value: ${AWS_ACCESS_KEY_ID} - name: AWS_SECRET_ACCESS_KEY value: ${AWS_SECRET_ACCESS_KEY} - name: AWS_SESSION_TOKEN value: ${AWS_SESSION_TOKEN} restartPolicy: Never backoffLimit: 0

Create the Dumpling job

Run the following commands to create the Dumpling job:

export name=dumpling export version=v8.5.1 export namespace=tidb-cluster export AWS_REGION=us-west-2 export AWS_ACCESS_KEY_ID=<your-access-key-id> export AWS_SECRET_ACCESS_KEY=<your-secret-access-key> export AWS_SESSION_TOKEN=<your-session-token> # Optional envsubst < dumpling_job.yaml | kubectl apply -f -

Check the Dumpling job status

Run the following command to check the Pod status of the Dumpling job:

kubectl -n ${namespace} get pod ${name}

View Dumpling job logs

Run the following command to view the logs of the Dumpling job:

kubectl -n ${namespace} logs pod ${name}

Was this page helpful?