Apache Hive Tables
TiDB Cloud Lake can query data that is cataloged by Apache Hive without copying it. Register the Hive Metastore as a TiDB Cloud Lake catalog, point to the object storage that holds the table data, and then query the tables as if they were native TiDB Cloud Lake objects.
Quick Start
Register the Hive Metastore
CREATE CATALOG hive_prod TYPE = HIVE CONNECTION = ( METASTORE_ADDRESS = '127.0.0.1:9083' URL = 's3://lakehouse/' ACCESS_KEY_ID = '<your_key_id>' SECRET_ACCESS_KEY = '<your_secret_key>' );Explore the catalog
USE CATALOG hive_prod; SHOW DATABASES; SHOW TABLES FROM tpch;Query Hive tables
SELECT l_orderkey, SUM(l_extendedprice) AS revenue FROM tpch.lineitem GROUP BY l_orderkey ORDER BY revenue DESC LIMIT 10;
Keep Metadata Fresh
Hive schemas or partitions can change outside of TiDB Cloud Lake. Refresh TiDB Cloud Lake’s cached metadata when that happens:
ALTER TABLE tpch.lineitem REFRESH CACHE;
Data Type Mapping
TiDB Cloud Lake automatically converts Hive primitive types to their closest native equivalents when queries run:
Nested structures such as STRUCT are surfaced through the VARIANT type.
Notes and Limitations
- Hive catalogs are read-only in TiDB Cloud Lake (writes must happen through Hive-compatible engines).
- Access to the underlying object storage is required; configure credentials by using connection parameters.
- Use
ALTER TABLE ... REFRESH CACHEwhenever table layout changes (for example, new partitions) to keep query results up to date.