Document

1. Fundamental Syntax

SELECT column1, aggregate_function(column2) 
FROM table_name
[WHERE conditions] 
GROUP BY column1 [HAVING group_condition]
[ORDER BY columns];

2. Internal Execution Process

Data Retrieval Phase
1. Base data fetched according to WHERE conditions
2. May use index-only scans if all columns are covered
Grouping Phase
1. Hash Aggregation (most common):
  1. Computes hash value for each group key
  2. Maintains hash table with aggregate values
  3. Memory-intensive but single-pass operation
2. Sort-Based Grouping (legacy/fallback):
  1. Sorts data by GROUP BY columns
  2. Scans sorted data to compute aggregates
  3. Requires temporary storage for large datasets
Filtering Phase
1. HAVING conditions applied to groups
2. Different from WHERE which filters rows before grouping

3. Performance Optimization

Index Strategies

-- Optimal index for GROUP BY + WHERE 
CREATE INDEX idx_category_status 
ON products(category, status); 

-- Covering index example 
CREATE INDEX idx_covering 
ON orders(customer_id, order_date) 
INCLUDE (total_amount);

Memory Configuration

MySQL: tmp_table_size, max_heap_table_size
PostgreSQL: work_mem
SQL Server: query memory grant

4. Advanced GROUP BY Patterns

GROUPING SETS

SELECT department_id, job_title, COUNT(*)
FROM employees
GROUP BY GROUPING SETS (
  (department_id),
  (job_title),
  (department_id, job_title),
  ()
);

CUBE (All combinations)

SELECT brand, category, size, SUM(sales) 
FROM products
GROUP BY CUBE (brand, category, size);

ROLLUP (Hierarchical aggregation)

SELECT year, quarter, month, SUM(revenue) 
FROM sales 
GROUP BY ROLLUP (year, quarter, month);

⚠️ Common Anti-Patterns

SELECTing non-aggregated columns not in GROUP BY
Using HAVING for row-level filtering (use WHERE instead)
Over-aggregating large datasets when sampling would suffice
Ignoring NULL groups (NULLs form their own group)

5. Database-Specific Implementations

MySQL

Uses temporary tables for complex GROUP BY
ONLY_FULL_GROUP_BY mode enforces SQL standard
Index visibility affects GROUP BY optimization

PostgreSQL

Advanced hash aggregation with memory spill to disk
Parallel aggregation for large datasets
Supports all ANSI grouping extensions

SQL Server

Stream aggregation operator for sorted data
Hash match operator for unsorted data
Memory grants based on cardinality estimates

Oracle

HASH GROUP BY operation
SORT GROUP BY for sorted data
Advanced analytic functions complement GROUP BY

💡 Expert Optimization Techniques

Two-Phase Aggregation

SELECT customer_id, SUM(total)
FROM (
  SELECT customer_id, SUM(amount) AS total
  FROM orders
  GROUP BY customer_id, DATE_TRUNC('month', order_date)
) subq
GROUP BY customer_id;

Materialized View

CREATE MATERIALIZED VIEW sales_summary AS
SELECT product_id, SUM(quantity) AS total_qty, AVG(price) AS avg_price
FROM sales
GROUP BY product_id
WITH DATA;

SQL GROUP BY

SQL GROUP BY: Complete Technical Guide

1. Fundamental Syntax

2. Internal Execution Process

3. Performance Optimization

Index Strategies

Memory Configuration

4. Advanced GROUP BY Patterns

GROUPING SETS

CUBE (All combinations)

ROLLUP (Hierarchical aggregation)

⚠️ Common Anti-Patterns

5. Database-Specific Implementations

MySQL

PostgreSQL

SQL Server

Oracle

💡 Expert Optimization Techniques

Two-Phase Aggregation

Materialized View