📣
TiDB Cloud Premium is now in public preview. Unlimited growth, instant elasticity, advanced security for enterprise workloads. Try it out →

Querying Parquet Files in Stage



Syntax

Tutorial

Step 1. Create an External Stage

Create an external stage with your own S3 bucket and credentials where your Parquet files are stored.

CREATE STAGE parquet_query_stage URL = 's3://load/parquet/' CONNECTION = ( ACCESS_KEY_ID = '<your-access-key-id>' SECRET_ACCESS_KEY = '<your-secret-access-key>' );

Step 2. Create Custom Parquet File Format

CREATE FILE FORMAT parquet_query_format TYPE = PARQUET;

Step 3. Query Parquet Files

query with colum names:

SELECT * FROM @parquet_query_stage ( FILE_FORMAT => 'parquet_query_format', PATTERN => '.*[.]parquet' );

query with path expressions:

SELECT $1 FROM @parquet_query_stage ( FILE_FORMAT => 'parquet_query_format', PATTERN => '.*[.]parquet' );

Query with Metadata

Query Parquet files directly from a stage, including metadata columns like METADATA$FILENAME and METADATA$FILE_ROW_NUMBER:

SELECT METADATA$FILENAME, METADATA$FILE_ROW_NUMBER, * FROM @parquet_query_stage ( FILE_FORMAT => 'parquet_query_format', PATTERN => '.*[.]parquet' );

Was this page helpful?