📣
TiDB Cloud Premium is now in public preview. Unlimited growth, instant elasticity, advanced security for enterprise workloads. Try it out →

TiDB Cloud Lake vs. Snowflake: Data Ingestion Benchmark



Overview

We conducted four specific benchmarks to evaluate TiDB Cloud Lake versus Snowflake:

  1. TPC-H SF100 Dataset Loading: Focuses on loading performance and cost for a large-scale dataset (100GB, ~600 million rows).
  2. ClickBench Hits Dataset Loading: Tests efficiency in loading a wide-table dataset (76GB, ~100 million rows, 105 columns), emphasizing challenges associated with high column counts.
  3. 1-Second Freshness: Measures the platforms' ability to ingest data within a strict 1-second freshness requirement.
  4. 5-Second Freshness: Compares the platforms' data ingestion capabilities under a 5-second freshness constraint.

Platforms

  • Snowflake: A well-known cloud data platform emphasizing scalable compute, data sharing.
  • TiDB Cloud Lake: A cloud-native data warehouse focusing on scalability and cost-efficiency.

Benchmark Conditions

Conducted on a Small-Size warehouse using data from the same S3 bucket.

Performance and Cost Comparison

  • TPC-H SF100 Data: TiDB Cloud Lake offers a 48% cost reduction over Snowflake.
  • ClickBench Hits Data: TiDB Cloud Lake achieves a 84% cost reduction.
  • 1-Second Freshness: TiDB Cloud Lake loads 400 times more data than Snowflake.
  • 5-Second Freshness: TiDB Cloud Lake loads over 27 times more data.

Data Ingestion Benchmarks

Data loading benchmark

TPC-H SF100 Dataset

MetricSnowflakeTiDB Cloud LakeDescription
Total Time695s446sTime to load the dataset.
Total Cost$0.77$0.40Cost of data loading.
  • Data Volume: 100GB
  • Rows: Approx. 600 million

ClickBench Hits Dataset

MetricSnowflakeTiDB Cloud LakeDescription
Total Time51m 17s9m 58sTime to load the dataset.
Total Cost$3.42$0.53Cost of data loading.
  • Data Volume: 76GB
  • Rows: Approx. 100 million
  • Table Width: 105 columns

Freshness Benchmarks

Freshness benchmark

1-Second Freshness Benchmark

Evaluates the volume of data ingested within a 1-second freshness requirement.

MetricSnowflakeTiDB Cloud LakeDescription
Total Time1s1sLoading time frame.
Total Rows100 Rows40,000 RowsVolume of data successfully ingested within 1s.

5-Second Freshness Benchmark

Assesses the volume of data that can be ingested within a 5-second freshness requirement.

MetricSnowflakeTiDB Cloud LakeDescription
Total Time5s5sLoading time frame.
Total Rows90,000 Rows2,500,000 RowsVolume of data successfully ingested within 5s.

Reproduce the Benchmark

You can reproduce the benchmark by following the steps below.

Benchmark Environment

The benchmark tests both Snowflake and TiDB Cloud Lake under similar conditions:

ParameterSnowflakeTiDB Cloud Lake
Warehouse SizeSmallSmall
Price$4/hour$3.2/hour
AWS Regionus-east-2us-east-2
StorageAWS S3AWS S3

Prerequisites

Data Ingestion Benchmark

The data ingestion benchmark can be reproduced using the following steps:

TPC-H Data Loading
  1. Snowflake Data Load:

  2. TiDB Cloud Lake Data Load:

ClickBench Hits Data Loading
  1. Snowflake Data Load:

  2. TiDB Cloud Lake Data Load:

Freshness Benchmark

Data generation and ingestion for the freshness benchmark can be reproduced using the following steps:

  1. Create an external stage in TiDB Cloud Lake.

    CREATE STAGE hits_unload_stage URL = 's3://unload/files/' CONNECTION = ( ACCESS_KEY_ID = '<your-access-key-id>', SECRET_ACCESS_KEY = '<your-secret-access-key>' );
  2. Unload data to the external stage.

    CREATE or REPLACE FILE FORMAT tsv_unload_format_gzip TYPE = TSV, COMPRESSION = gzip; COPY INTO @hits_unload_stage FROM ( SELECT * FROM hits limit <the-rows-you-want> ) FILE_FORMAT = (FORMAT_NAME = 'tsv_unload_format_gzip') DETAILED_OUTPUT = true;
  3. Load data from the external stage to the hits table.

    COPY INTO hits FROM @hits_unload_stage PATTERN = '.*[.]tsv.gz' FILE_FORMAT = (TYPE = TSV, COMPRESSION=auto);
  4. Measure results from the dashboard.

Was this page helpful?