📣
TiDB Cloud Premium is now in public preview. Unlimited growth, instant elasticity, advanced security for enterprise workloads. Try it out →

Transforming Data on Load



TiDB Cloud Lake's COPY INTO command allows data transformation during the loading process. This streamlines your ETL pipeline by incorporating basic transformations, eliminating the need for temporary tables.

See Querying & Transforming for syntax.

Key transformations you can perform:

  • Loading a subset of data columns: Selectively import specific columns.
  • Reordering columns: Change column order during load.
  • Converting datatypes: Ensure consistency and compatibility.
  • Performing arithmetic operations: Generate new derived data.
  • Loading data to a table with additional columns: Map and insert data into existing structures.

Tutorials

These tutorials demonstrate data transformation during loading. Each example shows loading from a staged file.

Before You Begin

Create a stage and generate a sample Parquet file:

CREATE STAGE my_parquet_stage; COPY INTO @my_parquet_stage FROM ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id, 'Name_' || CAST(number AS VARCHAR) AS name, 20 + MOD(number, 23) AS age, DATE_ADD('day', MOD(number, 60), '2022-01-01') AS onboarded FROM numbers(10) ) FILE_FORMAT = (TYPE = PARQUET);

Query the staged sample file:

SELECT * FROM @my_parquet_stage;

Result:

┌───────────────────────────────────────┐ │ id │ name │ age │ onboarded │ ├────────┼────────┼────────┼────────────┤ │ 1 │ Name_0 │ 20 │ 2022-01-01 │ │ 2 │ Name_5 │ 25 │ 2022-01-06 │ │ 3 │ Name_1 │ 21 │ 2022-01-02 │ │ 4 │ Name_6 │ 26 │ 2022-01-07 │ │ 5 │ Name_7 │ 27 │ 2022-01-08 │ │ 6 │ Name_2 │ 22 │ 2022-01-03 │ │ 7 │ Name_8 │ 28 │ 2022-01-09 │ │ 8 │ Name_3 │ 23 │ 2022-01-04 │ │ 9 │ Name_4 │ 24 │ 2022-01-05 │ │ 10 │ Name_9 │ 29 │ 2022-01-10 │ └───────────────────────────────────────┘

Tutorial 1 - Loading a Subset of Data Columns

Load data into a table with fewer columns than the source file (e.g., excluding 'age').

CREATE TABLE employees_no_age ( id INT, name VARCHAR, onboarded timestamp ); COPY INTO employees_no_age FROM ( SELECT t.id, t.name, t.onboarded FROM @my_parquet_stage t ) FILE_FORMAT = (TYPE = PARQUET) PATTERN = '.*parquet'; SELECT * FROM employees_no_age;

Result (first 3 rows):

┌──────────────────────────────────────────────────────────┐ │ id │ name │ onboarded │ ├─────────────────┼──────────────────┼─────────────────────┤ │ 1 │ Name_0 │ 2022-01-01 00:00:00 │ │ 2 │ Name_5 │ 2022-01-06 00:00:00 │ │ 3 │ Name_1 │ 2022-01-02 00:00:00 │ └──────────────────────────────────────────────────────────┘

Tutorial 2 - Reordering Columns During Load

Load data into a table with columns in a different order (e.g., 'age' before 'name').

CREATE TABLE employees_new_order ( id INT, age INT, name VARCHAR, onboarded timestamp ); COPY INTO employees_new_order FROM ( SELECT t.id, t.age, t.name, t.onboarded FROM @my_parquet_stage t ) FILE_FORMAT = (TYPE = PARQUET) PATTERN = '.*parquet'; SELECT * FROM employees_new_order;

Result (first 3 rows):

┌────────────────────────────────────────────────────────────────────────────┐ │ id │ age │ name │ onboarded │ ├─────────────────┼─────────────────┼──────────────────┼─────────────────────┤ │ 1 │ 20 │ Name_0 │ 2022-01-01 00:00:00 │ │ 2 │ 25 │ Name_5 │ 2022-01-06 00:00:00 │ │ 3 │ 21 │ Name_1 │ 2022-01-02 00:00:00 │ └────────────────────────────────────────────────────────────────────────────┘

Tutorial 3 - Converting Datatypes During Load

Load data and convert a column's datatype (e.g., 'onboarded' to DATE).

CREATE TABLE employees_date ( id INT, name VARCHAR, age INT, onboarded date ); COPY INTO employees_date FROM ( SELECT t.id, t.name, t.age, to_date(t.onboarded) FROM @my_parquet_stage t ) FILE_FORMAT = (TYPE = PARQUET) PATTERN = '.*parquet'; SELECT * FROM employees_date;

Result (first 3 rows):

┌───────────────────────────────────────────────────────────────────────┐ │ id │ name │ age │ onboarded │ ├─────────────────┼──────────────────┼─────────────────┼────────────────┤ │ 1 │ Name_0 │ 20 │ 2022-01-01 │ │ 2 │ Name_5 │ 25 │ 2022-01-06 │ │ 3 │ Name_1 │ 21 │ 2022-01-02 │ └───────────────────────────────────────────────────────────────────────┘

Tutorial 4 - Performing Arithmetic Operations During Load

Load data and perform arithmetic operations (e.g., increment 'age' by 1).

CREATE TABLE employees_new_age ( id INT, name VARCHAR, age INT, onboarded timestamp ); COPY INTO employees_new_age FROM ( SELECT t.id, t.name, t.age + 1, t.onboarded FROM @my_parquet_stage t ) FILE_FORMAT = (TYPE = PARQUET) PATTERN = '.*parquet'; SELECT * FROM employees_new_age;

Result (first 3 rows):

┌────────────────────────────────────────────────────────────────────────────┐ │ id │ name │ age │ onboarded │ ├─────────────────┼──────────────────┼─────────────────┼─────────────────────┤ │ 1 │ Name_0 │ 21 │ 2022-01-01 00:00:00 │ │ 2 │ Name_5 │ 26 │ 2022-01-06 00:00:00 │ │ 3 │ Name_1 │ 22 │ 2022-01-02 00:00:00 │ └────────────────────────────────────────────────────────────────────────────┘

Tutorial 5 - Loading to a Table with Additional Columns

Load data into a table that has more columns than the source file.

CREATE TABLE employees_plus ( id INT, name VARCHAR, age INT, onboarded timestamp, lastday timestamp ); COPY INTO employees_plus (id, name, age, onboarded) FROM ( SELECT t.id, t.name, t.age, t.onboarded FROM @my_parquet_stage t ) FILE_FORMAT = (TYPE = PARQUET) PATTERN = '.*parquet'; SELECT * FROM employees_plus;

Result (first 3 rows):

┌──────────────────────────────────────────────────────────────────────────────────────────────────┐ │ id │ name │ age │ onboarded │ lastday │ ├─────────────────┼──────────────────┼─────────────────┼─────────────────────┼─────────────────────┤ │ 1 │ Name_0 │ 20 │ 2022-01-01 00:00:00 │ NULL │ │ 2 │ Name_5 │ 25 │ 2022-01-06 00:00:00 │ NULL │ │ 3 │ Name_1 │ 21 │ 2022-01-02 00:00:00 │ NULL │ └──────────────────────────────────────────────────────────────────────────────────────────────────┘

Was this page helpful?