📣
TiDB Cloud Premium is now in public preview. Unlimited growth, instant elasticity, advanced security for enterprise workloads. Try it out →

APPROX_COUNT_DISTINCT



Estimates the number of distinct values in a data set with the HyperLogLog algorithm.

The HyperLogLog algorithm provides an approximation of the number of unique elements using little memory and time. Consider using this function when dealing with large data sets where an estimated result can be accepted. In exchange for some accuracy, this is a fast and efficient method of returning distinct counts.

To get an accurate result, use COUNT_DISTINCT. See Examples for more explanations.

Syntax

APPROX_COUNT_DISTINCT(<expr>)

Return Type

Integer.

Example

Create a Table and Insert Sample Data

CREATE TABLE user_events ( id INT, user_id INT, event_name VARCHAR ); INSERT INTO user_events (id, user_id, event_name) VALUES (1, 1, 'Login'), (2, 2, 'Login'), (3, 3, 'Login'), (4, 1, 'Logout'), (5, 2, 'Logout'), (6, 4, 'Login'), (7, 1, 'Login');

Query Demo: Estimate the Number of Distinct User IDs

SELECT APPROX_COUNT_DISTINCT(user_id) AS approx_distinct_user_count FROM user_events;

Result

| approx_distinct_user_count | |----------------------------| | 4 |

Was this page helpful?