Amazon SQS (S3) Integration Task (Beta)
This page describes how to create an Amazon SQS (S3) integration task that consumes S3 object creation events from an SQS queue and writes the corresponding object data into TiDB Cloud Lake.
This task is designed for S3 event-driven data ingestion. After an upstream system writes an object to S3, S3 sends an ObjectCreated event to SQS. TiDB Cloud Lake consumes the SQS message through AssumeRole and writes data into TiDB Cloud Lake based on the bucket and object key in the event.
If you need to create reusable SQS (S3) connection settings first, see Amazon SQS (S3) - IAM Role (Beta).
Use Cases
- Automatically ingest newly written S3 objects based on S3
ObjectCreatedevents - Use S3 event notifications to drive data ingestion and reduce latency after new files arrive
- Avoid relying only on polling an S3 path to discover new files
Workflow
- An upstream system writes an object to an S3 bucket.
- S3 Event Notification sends the
ObjectCreatedevent to an SQS standard queue. - TiDB Cloud Lake reads messages from the SQS queue through the IAM Role configured by the user.
- The task parses the S3 event records in the message.
- The task writes data into the TiDB Cloud Lake target table based on the bucket, object key, and file format in the S3 event records.
- After the write succeeds, the task deletes the processed SQS message from the queue.
Prerequisites
Before creating an SQS (S3) integration task, make sure:
- An Amazon SQS (S3) - IAM Role data source has already been created
- The S3 bucket has been configured with
ObjectCreatedevent notification and sends events to the target SQS queue - The SQS queue policy allows Amazon S3 to call
sqs:SendMessage - The user IAM Role allows TiDB Cloud Lake platform roles to access it through
sts:AssumeRole - The user IAM Role has permissions to read the target S3 objects and consume the target SQS queue
- The SQS queue contains messages in the standard S3 Event Notification format
- The bucket, prefix, and suffix in the S3 notification match the data source configuration
Creating an SQS (S3) Integration Task
Step 1: Basic Info
Navigate to Data > Data Integration and click Create Task.
Select an SQS (S3) data source, then configure the basic parameters:
Step 2: Preview Data
After completing the basic settings, click Next to preview the source data.
The preview result is the same as an Amazon S3 Integration Task. The system locates the corresponding S3 objects based on the SQS (S3) configuration, reads file content, and displays:
- Sample data with column names and data types
- The matched S3 object list and object sizes
Step 3: Set Target Table
Configure the target location in TiDB Cloud Lake:
The system infers column names and data types from the previewed S3 object content. Before continuing, you can review and edit the target table schema. If writing to an existing table, select the target table and verify the column mapping.
Click Create to create the integration task.
Task Behavior
An SQS (S3) integration task is a continuously running task. After it starts, it periodically reads messages from the SQS queue and writes data into the target table until it is manually stopped.
Difference from Amazon S3 Integration Task
If your goal is to periodically scan an S3 path and import file content, use an Amazon S3 Integration Task. If your goal is to trigger ingestion based on S3 ObjectCreated events, use an Amazon SQS (S3) Integration Task.