📣

TiDB Cloud Premium is now in public preview. Unlimited growth, instant elasticity, advanced security for enterprise workloads. Try it out →

Apache Hive Tables

TiDB Cloud Lake can query data that is cataloged by Apache Hive without copying it. Register the Hive Metastore as a TiDB Cloud Lake catalog, point to the object storage that holds the table data, and then query the tables as if they were native TiDB Cloud Lake objects.

Quick Start

Register the Hive Metastore

CREATE CATALOG hive_prod
TYPE = HIVE
CONNECTION = (
  METASTORE_ADDRESS = '127.0.0.1:9083'
  URL = 's3://lakehouse/'
  ACCESS_KEY_ID = '<your_key_id>'
  SECRET_ACCESS_KEY = '<your_secret_key>'
);

Explore the catalog

USE CATALOG hive_prod;
SHOW DATABASES;
SHOW TABLES FROM tpch;

Query Hive tables

SELECT l_orderkey, SUM(l_extendedprice) AS revenue
FROM tpch.lineitem
GROUP BY l_orderkey
ORDER BY revenue DESC
LIMIT 10;

Keep Metadata Fresh

Hive schemas or partitions can change outside of TiDB Cloud Lake. Refresh TiDB Cloud Lake’s cached metadata when that happens:

ALTER TABLE tpch.lineitem REFRESH CACHE;

Data Type Mapping

TiDB Cloud Lake automatically converts Hive primitive types to their closest native equivalents when queries run:

Hive Type	TiDB Cloud Lake Type
`BOOLEAN`	BOOLEAN
`TINYINT`, `SMALLINT`, `INT`, `BIGINT`	Integer types
`FLOAT`, `DOUBLE`	Floating-point types
`DECIMAL(p,s)`	DECIMAL
`STRING`, `VARCHAR`, `CHAR`	STRING
`DATE`, `TIMESTAMP`	DATETIME
`ARRAY<type>`	ARRAY
`MAP<key,value>`	MAP

Nested structures such as STRUCT are surfaced through the VARIANT type.

Notes and Limitations

Hive catalogs are read-only in TiDB Cloud Lake (writes must happen through Hive-compatible engines).
Access to the underlying object storage is required; configure credentials by using connection parameters.
Use ALTER TABLE ... REFRESH CACHE whenever table layout changes (for example, new partitions) to keep query results up to date.

Apache Hive Tables

Quick Start

Keep Metadata Fresh

Data Type Mapping

Notes and Limitations

Was this page helpful?