📣
TiDB Cloud Premium is now in public preview. Unlimited growth, instant elasticity, advanced security for enterprise workloads. Try it out →

Aggregating Index



Aggregating indexes dramatically accelerate analytical queries by precomputing and storing aggregation results, eliminating the need to scan entire tables for common analytics operations.

What Problem Does It Solve?

Analytical queries on large datasets face significant performance challenges:

ProblemImpactAggregating Index Solution
Full Table ScansSUM, COUNT, MIN, MAX queries scan millions of rowsRead precomputed results instantly
Repeated CalculationsSame aggregations computed over and overStore results once, reuse many times
Slow Dashboard QueriesAnalytics dashboards take minutes to loadSub-second response for common metrics
High Compute CostsHeavy aggregation workloads consume resourcesMinimal compute for cached results
Poor User ExperienceUsers wait for reports and analyticsInstant results for business intelligence

Example: A sales analytics query SELECT SUM(revenue), COUNT(*) FROM sales WHERE region = 'US' on 100M rows. Without aggregating index, it scans all US sales records. With aggregating index, it returns precomputed results instantly.

How It Works

  1. Index Creation → Define aggregation queries to precompute
  2. Result Storage → TiDB Cloud Lake stores aggregated results in optimized blocks
  3. Query Matching → Incoming queries automatically use precomputed results
  4. Automatic Updates → Results refresh when underlying data changes

Quick Setup

-- Create table with sample data CREATE TABLE sales(region VARCHAR, product VARCHAR, revenue DECIMAL, quantity INT); -- Create aggregating index for common analytics CREATE AGGREGATING INDEX sales_summary AS SELECT region, SUM(revenue), COUNT(*), AVG(quantity) FROM sales GROUP BY region; -- Refresh the index (manual mode) REFRESH AGGREGATING INDEX sales_summary; -- Verify the index is used EXPLAIN SELECT region, SUM(revenue) FROM sales GROUP BY region;

Supported Operations

✅ Supported❌ Not Supported
SUM, COUNT, MIN, MAX, AVGWindow Functions
GROUP BY clausesGROUPING SETS
WHERE filtersORDER BY, LIMIT
Simple aggregationsComplex subqueries

Refresh Strategies

StrategyWhen to UseConfiguration
Automatic (SYNC)Real-time analytics, small datasetsCREATE AGGREGATING INDEX ... SYNC
ManualLarge datasets, batch processingCREATE AGGREGATING INDEX ... (default)
Background (Cloud)Production workloadsAutomatic in TiDB Cloud Lake

Automatic vs Manual Refresh

-- Automatic refresh (updates with every data change) CREATE AGGREGATING INDEX auto_summary AS SELECT region, SUM(revenue) FROM sales GROUP BY region SYNC; -- Manual refresh (update on demand) CREATE AGGREGATING INDEX manual_summary AS SELECT region, SUM(revenue) FROM sales GROUP BY region; REFRESH AGGREGATING INDEX manual_summary;

Performance Example

This example shows the dramatic performance improvement:

-- Prepare data CREATE TABLE agg(a int, b int, c int); INSERT INTO agg VALUES (1,1,4), (1,2,1), (1,2,4), (2,2,5); -- Create an aggregating index CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg; -- Refresh the aggregating index REFRESH AGGREGATING INDEX my_agg_index; -- Verify if the aggregating index works EXPLAIN SELECT MIN(a), MAX(c) FROM agg; -- Key indicators in the execution plan: -- ├── aggregating index: [SELECT MIN(a), MAX(c) FROM default.agg] -- ├── rewritten query: [selection: [index_col_0 (#0), index_col_1 (#1)]] -- This shows the query uses precomputed results instead of scanning raw data

Best Practices

PracticeBenefit
Index Common QueriesFocus on frequently executed analytics
Use Manual RefreshBetter control over update timing
Monitor Index UsageUse EXPLAIN to verify index utilization
Clean Up Unused IndexesRemove indexes that aren't being used
Match Query PatternsIndex filters should match actual queries

Management Commands

CommandPurpose
CREATE AGGREGATING INDEXCreate new aggregating index
REFRESH AGGREGATING INDEXUpdate index with latest data
DROP AGGREGATING INDEXRemove index (use VACUUM TABLE to clean storage)
SHOW AGGREGATING INDEXESList all indexes

Important Notes

When to Use Aggregating Indexes:

  • Frequent analytical queries (dashboards, reports)
  • Large datasets with repeated aggregations
  • Stable query patterns
  • Performance-critical applications

When NOT to Use:

  • Frequently changing data
  • One-time analytical queries
  • Simple queries on small tables

Configuration

-- Enable/disable aggregating index feature SET enable_aggregating_index_scan = 1; -- Enable (default) SET enable_aggregating_index_scan = 0; -- Disable

Aggregating indexes are most effective for repetitive analytical workloads on large datasets. Start with your most common dashboard and reporting queries.

Was this page helpful?