📣

TiDB Cloud Premium is now in public preview. Unlimited growth, instant elasticity, advanced security for enterprise workloads. Try it out →

TiDB Cloud Lake vs. Snowflake: Data Ingestion Benchmark

Overview

We conducted four specific benchmarks to evaluate TiDB Cloud Lake versus Snowflake:

TPC-H SF100 Dataset Loading: Focuses on loading performance and cost for a large-scale dataset (100GB, ~600 million rows).
ClickBench Hits Dataset Loading: Tests efficiency in loading a wide-table dataset (76GB, ~100 million rows, 105 columns), emphasizing challenges associated with high column counts.
1-Second Freshness: Measures the platforms' ability to ingest data within a strict 1-second freshness requirement.
5-Second Freshness: Compares the platforms' data ingestion capabilities under a 5-second freshness constraint.

Platforms

Snowflake: A well-known cloud data platform emphasizing scalable compute, data sharing.
TiDB Cloud Lake: A cloud-native data warehouse focusing on scalability and cost-efficiency.

Benchmark Conditions

Conducted on a Small-Size warehouse using data from the same S3 bucket.

Note

This comparison is based on Snowflake Gen1 standard warehouses (GENERATION = '1').

Performance and Cost Comparison

TPC-H SF100 Data: TiDB Cloud Lake offers a 48% cost reduction over Snowflake.
ClickBench Hits Data: TiDB Cloud Lake achieves a 84% cost reduction.
1-Second Freshness: TiDB Cloud Lake loads 400 times more data than Snowflake.
5-Second Freshness: TiDB Cloud Lake loads over 27 times more data.

Data Ingestion Benchmarks

TPC-H SF100 Dataset

Metric	Snowflake	TiDB Cloud Lake	Description
Total Time	695s	446s	Time to load the dataset.
Total Cost	$0.77	$0.40	Cost of data loading.

Data Volume: 100GB
Rows: Approx. 600 million

ClickBench Hits Dataset

Metric	Snowflake	TiDB Cloud Lake	Description
Total Time	51m 17s	9m 58s	Time to load the dataset.
Total Cost	$3.42	$0.53	Cost of data loading.

Data Volume: 76GB
Rows: Approx. 100 million
Table Width: 105 columns

Freshness Benchmarks

1-Second Freshness Benchmark

Evaluates the volume of data ingested within a 1-second freshness requirement.

Metric	Snowflake	TiDB Cloud Lake	Description
Total Time	1s	1s	Loading time frame.
Total Rows	100 Rows	40,000 Rows	Volume of data successfully ingested within 1s.

5-Second Freshness Benchmark

Assesses the volume of data that can be ingested within a 5-second freshness requirement.

Metric	Snowflake	TiDB Cloud Lake	Description
Total Time	5s	5s	Loading time frame.
Total Rows	90,000 Rows	2,500,000 Rows	Volume of data successfully ingested within 5s.

Reproduce the Benchmark

You can reproduce the benchmark by following the steps below.

Benchmark Environment

The benchmark tests both Snowflake and TiDB Cloud Lake under similar conditions:

Parameter	Snowflake	TiDB Cloud Lake
Warehouse Size	Small	Small
Price	$4/hour	$3.2/hour
AWS Region	us-east-2	us-east-2
Storage	AWS S3	AWS S3

The TPC-H SF100 dataset, sourced from Amazon Redshift.
The ClickBench dataset, sourced from ClickBench.

Prerequisites

Have a Snowflake account
Create a TiDB Cloud Lake account

Data Ingestion Benchmark

The data ingestion benchmark can be reproduced using the following steps:

TPC-H Data Loading

Snowflake Data Load:
- Log into your Snowflake account.
- Create tables corresponding to the TPC-H schema. SQL Script.
- Use the COPY INTO command to load the data from AWS S3. SQL Script.
TiDB Cloud Lake Data Load:
- Sign in to your TiDB Cloud Lake account.
- Create the necessary tables as per the TPC-H schema. SQL Script.
- Use a method similar to Snowflake for loading data from AWS S3. SQL Script.

ClickBench Hits Data Loading

Snowflake Data Load:
- Log into your Snowflake account.
- Create tables corresponding to the hits schema. SQL Script.
- Use the COPY INTO command to load the data from AWS S3. SQL Script.
TiDB Cloud Lake Data Load:
- Sign in to your TiDB Cloud Lake account.
- Create the necessary tables as per the hits schema. SQL Script.
- Use a method similar to Snowflake for loading data from AWS S3. SQL Script.

Freshness Benchmark

Data generation and ingestion for the freshness benchmark can be reproduced using the following steps:

Create an external stage in TiDB Cloud Lake.

CREATE STAGE hits_unload_stage
URL = 's3://unload/files/'
CONNECTION = (
    ACCESS_KEY_ID = '<your-access-key-id>',
    SECRET_ACCESS_KEY = '<your-secret-access-key>'
);

Unload data to the external stage.

CREATE or REPLACE FILE FORMAT tsv_unload_format_gzip
    TYPE = TSV,
    COMPRESSION = gzip;

COPY INTO @hits_unload_stage
FROM (
    SELECT *
    FROM hits limit <the-rows-you-want>
)
FILE_FORMAT = (FORMAT_NAME = 'tsv_unload_format_gzip')
DETAILED_OUTPUT = true;

Load data from the external stage to the hits table.

COPY INTO hits
    FROM @hits_unload_stage
    PATTERN = '.*[.]tsv.gz'
    FILE_FORMAT = (TYPE = TSV,  COMPRESSION=auto);

Measure results from the dashboard.