COPY INTO <location>
COPY INTO allows you to unload data from a table or query into one or more files in one of the following locations:
- User / Internal / External stages: See What is Stage? to learn about stages in TiDB Cloud Lake.
- Buckets or containers created in a storage service.
See also: COPY INTO <table>
Syntax
COPY INTO { internalStage | externalStage | externalLocation }
FROM { [<database_name>.]<table_name> | ( <query> ) }
[ PARTITION BY ( <expr> ) ]
[ FILE_FORMAT = (
FORMAT_NAME = '<your-custom-format>'
| TYPE = { CSV | TSV | NDJSON | PARQUET | LANCE } [ formatTypeOptions ]
) ]
[ copyOptions ]
[ VALIDATION_MODE = RETURN_ROWS ]
[ DETAILED_OUTPUT = true | false ]
internalStage
internalStage ::= @<internal_stage_name>[/<path>]
externalStage
externalStage ::= @<external_stage_name>[/<path>]
externalLocation
externalLocation ::=
's3://<bucket>[<path>]'
CONNECTION = (
<connection_parameters>
)
For the connection parameters available for accessing Amazon S3-like storage services, see Connection Parameters.
externalLocation ::=
'azblob://<container>[<path>]'
CONNECTION = (
<connection_parameters>
)
For the connection parameters available for accessing Azure Blob Storage, see Connection Parameters.
externalLocation ::=
'gcs://<bucket>[<path>]'
CONNECTION = (
<connection_parameters>
)
For the connection parameters available for accessing Google Cloud Storage, see Connection Parameters.
externalLocation ::=
'oss://<bucket>[<path>]'
CONNECTION = (
<connection_parameters>
)
For the connection parameters available for accessing Alibaba Cloud OSS, see Connection Parameters.
externalLocation ::=
'cos://<bucket>[<path>]'
CONNECTION = (
<connection_parameters>
)
For the connection parameters available for accessing Tencent Cloud Object Storage, see Connection Parameters.
FILE_FORMAT
See Input & Output File Formats for details.
LANCE is supported only in COPY INTO <location>. TiDB Cloud Lake writes a Lance dataset directory under the target path, not a standalone file.
PARTITION BY
Specifies an expression used to partition the unloaded data into separate folders. The expression must evaluate to a STRING type. Each distinct value produced by the expression creates a subfolder in the destination path, and the corresponding rows are written into files under that subfolder.
- If the expression evaluates to
NULL, the rows are placed in a special_NULL_folder. - The expression can reference any columns from the source table or query.
- Path traversal (
..) is not allowed in partition values.
The following options are incompatible with PARTITION BY and will cause an error if set:
copyOptions
copyOptions ::=
[ SINGLE = true | false ]
[ MAX_FILE_SIZE = <num> ]
[ OVERWRITE = true | false ]
[ INCLUDE_QUERY_ID = true | false ]
[ USE_RAW_PATH = true | false ]
DETAILED_OUTPUT
Determines whether a detailed result of the data unloading should be returned, with the default value set to false. For more information, see Output.
Output
COPY INTO provides a summary of the data unloading results with these columns:
When DETAILED_OUTPUT is set to true, COPY INTO provides results with the following columns. This assists in locating the unloaded files, especially when using MAX_FILE_SIZE to separate the unloaded data into multiple files.
Examples
In this section, the provided examples make use of the following table and data:
-- Create sample table
CREATE TABLE canadian_city_population (
city_name VARCHAR(50),
population INT
);
-- Insert sample data
INSERT INTO canadian_city_population (city_name, population)
VALUES
('Toronto', 2731571),
('Montreal', 1704694),
('Vancouver', 631486),
('Calgary', 1237656),
('Ottawa', 934243),
('Edmonton', 972223),
('Quebec City', 542298),
('Winnipeg', 705244),
('Hamilton', 536917),
('Halifax', 403390);
Example 1: Unloading to Internal Stage
This example unloads data to an internal stage:
-- Create an internal stage
CREATE STAGE my_internal_stage;
-- Unload data from the table to the stage using the PARQUET file format
COPY INTO @my_internal_stage
FROM canadian_city_population
FILE_FORMAT = (TYPE = PARQUET);
ββββββββββββββββββββββββββββββββββββββββββββββ
β rows_unloaded β input_bytes β output_bytes β
βββββββββββββββββΌββββββββββββββΌβββββββββββββββ€
β 10 β 211 β 572 β
ββββββββββββββββββββββββββββββββββββββββββββββ
LIST @my_internal_stage;
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β name β size β md5 β last_modified β creator β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββββββββΌβββββββββββββββββββ€
β data_abe520a3-ee88-488c-9221-b07c562c9a30_0000_00000000.parquet β 572 β NULL β 2024-01-18 16:20:48.979 +0000 β NULL β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Example 2: Unloading to Compressed File
This example unloads data into a compressed file:
-- Create an internal stage
CREATE STAGE my_internal_stage;
-- Unload data from the table to the stage using the CSV file format with gzip compression
COPY INTO @my_internal_stage
FROM canadian_city_population
FILE_FORMAT = (TYPE = CSV COMPRESSION = gzip);
ββββββββββββββββββββββββββββββββββββββββββββββ
β rows_unloaded β input_bytes β output_bytes β
βββββββββββββββββΌββββββββββββββΌβββββββββββββββ€
β 10 β 182 β 168 β
ββββββββββββββββββββββββββββββββββββββββββββββ
LIST @my_internal_stage;
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β name β size β md5 β last_modified β creator β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββββββββΌβββββββββββββββββββ€
β data_7970afa5-32e3-4e7d-b793-e42a2a82a8e6_0000_00000000.csv.gz β 168 β NULL β 2024-01-18 16:27:01.663 +0000 β NULL β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-- COPY INTO also works with custom file formats. See below:
-- Create a custom file format named my_csv_gzip with CSV format and gzip compression
CREATE FILE FORMAT my_csv_gzip TYPE = CSV COMPRESSION = gzip;
-- Unload data from the table to the stage using the custom file format my_csv_gzip
COPY INTO @my_internal_stage
FROM canadian_city_population
FILE_FORMAT = (FORMAT_NAME = 'my_csv_gzip');
ββββββββββββββββββββββββββββββββββββββββββββββ
β rows_unloaded β input_bytes β output_bytes β
βββββββββββββββββΌββββββββββββββΌβββββββββββββββ€
β 10 β 182 β 168 β
ββββββββββββββββββββββββββββββββββββββββββββββ
LIST @my_internal_stage;
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β name β size β md5 β last_modified β creator β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββββββββΌβββββββββββββββββββ€
β data_d006ba1c-0609-46d7-a67b-75c7078d86ff_0000_00000000.csv.gz β 168 β NULL β 2024-01-18 16:29:29.721 +0000 β NULL β
β data_7970afa5-32e3-4e7d-b793-e42a2a82a8e6_0000_00000000.csv.gz β 168 β NULL β 2024-01-18 16:27:01.663 +0000 β NULL β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Example 3: Unloading to Bucket
This example unloads data into a bucket on MinIO:
-- Unload data from the table to a bucket named 'lake' on MinIO using the PARQUET file format
COPY INTO 's3://lake'
CONNECTION = (
ENDPOINT_URL = 'http://localhost:9000/',
ACCESS_KEY_ID = 'ROOTUSER',
SECRET_ACCESS_KEY = 'CHANGEME123',
region = 'us-west-2'
)
FROM canadian_city_population
FILE_FORMAT = (TYPE = PARQUET);
ββββββββββββββββββββββββββββββββββββββββββββββ
β rows_unloaded β input_bytes β output_bytes β
βββββββββββββββββΌββββββββββββββΌβββββββββββββββ€
β 10 β 211 β 572 β
ββββββββββββββββββββββββββββββββββββββββββββββ
Example 4: Unloading with PARTITION BY
This example unloads data into partitioned folders based on a derived expression:
-- Create a sample table
CREATE TABLE sales_data (
sale_date DATE,
region VARCHAR,
amount INT
);
INSERT INTO sales_data VALUES
('2025-01-15', 'east', 100),
('2025-01-20', 'west', 200),
('2025-02-10', 'east', 150),
(NULL, 'west', 50);
-- Create an internal stage
CREATE STAGE partitioned_stage;
-- Unload data partitioned by year-month derived from sale_date
-- When sale_date is NULL, to_varchar() returns NULL, so the entire
-- concatenation evaluates to NULL and the row lands in the _NULL_ folder.
COPY INTO @partitioned_stage
FROM sales_data
PARTITION BY ('month=' || to_varchar(sale_date, 'YYYY-MM'))
FILE_FORMAT = (TYPE = PARQUET);
-- Verify the partitioned folder layout
SELECT name FROM list_stage(location => '@partitioned_stage') ORDER BY name;
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β name β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β _NULL_/data_<query_id>_0000_00000000.parquet β
β month=2025-01/data_<query_id>_0000_00000000.parquet β
β month=2025-02/data_<query_id>_0000_00000000.parquet β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
When the partition expression evaluates to NULL, the data is placed in a _NULL_ folder. Each unique partition value creates its own subfolder containing the corresponding data files.
Example 5: Unloading to a Lance Dataset
This example unloads data as a Lance dataset directory instead of standalone files:
CREATE STAGE ml_stage;
COPY INTO @ml_stage/datasets/train
FROM (
SELECT number, number + 1 AS label
FROM numbers(10)
)
FILE_FORMAT = (TYPE = LANCE)
USE_RAW_PATH = TRUE
OVERWRITE = TRUE
DETAILED_OUTPUT = TRUE;
The output path will contain a Lance dataset layout with entries similar to:
datasets/train/_versions/...
datasets/train/data/... .lance
datasets/train/*.manifest
For a complete end-to-end example, including validation with Python lance, see Unloading Lance Dataset.