📣
TiDB Cloud Premium is now in public preview. Unlimited growth, instant elasticity, advanced security for enterprise workloads. Try it out →

HISTOGRAM



Generates a data distribution histogram using an "equal height" bucketing strategy.

Syntax

HISTOGRAM(<expr>) -- The following two forms are equivalent: HISTOGRAM(<max_num_buckets>)(<expr>) HISTOGRAM(<expr> [, <max_num_buckets>])
ParameterDescription
exprThe data type of expr should be sortable.
max_num_bucketsOptional positive integer specifying the maximum number of buckets. Default is 128.

Return Type

Returns either an empty string or a JSON object with the following structure:

  • buckets: List of buckets with detailed information:
    • lower: Lower bound of the bucket.
    • upper: Upper bound of the bucket.
    • count: Number of elements in the bucket.
    • pre_sum: Cumulative count of elements up to the current bucket.
    • ndv: Number of distinct values in the bucket.

Examples

This example shows how the HISTOGRAM function analyzes the distribution of c_int values in the histagg table, returning bucket boundaries, distinct value counts, element counts, and cumulative counts:

CREATE TABLE histagg ( c_id INT, c_tinyint TINYINT, c_smallint SMALLINT, c_int INT ); INSERT INTO histagg VALUES (1, 10, 20, 30), (1, 11, 21, 33), (1, 11, 12, 13), (2, 21, 22, 23), (2, 31, 32, 33), (2, 10, 20, 30); SELECT HISTOGRAM(c_int) FROM histagg; ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ histogram(c_int) │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [{"lower":"13","upper":"13","ndv":1,"count":1,"pre_sum":0},{"lower":"23","upper":"23","ndv":1,"count":1,"pre_sum":1},{"lower":"30","upper":"30","ndv":1,"count":2,"pre_sum":2},{"lower":"33","upper":"33","ndv":1,"count":2,"pre_sum":4}] │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

The result is returned as a JSON array:

[ { "lower": "13", "upper": "13", "ndv": 1, "count": 1, "pre_sum": 0 }, { "lower": "23", "upper": "23", "ndv": 1, "count": 1, "pre_sum": 1 }, { "lower": "30", "upper": "30", "ndv": 1, "count": 2, "pre_sum": 2 }, { "lower": "33", "upper": "33", "ndv": 1, "count": 2, "pre_sum": 4 } ]

This example shows how HISTOGRAM(2) groups c_int values into two buckets:

SELECT HISTOGRAM(2)(c_int) FROM histagg; -- Or SELECT HISTOGRAM(c_int, 2) FROM histagg; ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ histogram(2)(c_int) │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [{"lower":"13","upper":"30","ndv":3,"count":4,"pre_sum":0},{"lower":"33","upper":"33","ndv":1,"count":2,"pre_sum":4}] │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

The result is returned as a JSON array:

[ { "lower": "13", "upper": "30", "ndv": 3, "count": 4, "pre_sum": 0 }, { "lower": "33", "upper": "33", "ndv": 1, "count": 2, "pre_sum": 4 } ]

Was this page helpful?