Complete SQL Subqueries Technical Guide
1. Fundamental Subquery Types
-- Scalar subquery (returns single value)
SELECT name, (SELECT MAX(price) FROM products) AS max_price
FROM categories;
-- Row subquery (returns single row)
SELECT * FROM employees
WHERE (department, salary) =
(SELECT department, MAX(salary) FROM employees GROUP BY department LIMIT 1);
-- Table subquery (returns result set)
SELECT * FROM
(SELECT product_id, SUM(quantity) AS total FROM orders GROUP BY product_id) AS product_totals
WHERE total > 100;
-
Scalar: Returns exactly one value (one row,
one column)
-
Row: Returns one row with multiple columns
-
Table: Returns full result set (used in FROM
clause)
-
Correlated: References outer query (executed
repeatedly)
2. Subquery Operators
Comparison Operators
-- IN operator
SELECT * FROM products
WHERE category_id IN (SELECT id FROM categories WHERE active = 1);
-- EXISTS operator
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.total > 1000);
-- Comparison with ALL/ANY
SELECT * FROM employees
WHERE salary > ALL (SELECT salary FROM interns);
Specialized Forms
-- LATERAL joins (PostgreSQL)
SELECT d.name, e.name
FROM departments d,
LATERAL (SELECT name FROM employees WHERE department_id = d.id ORDER BY salary DESC LIMIT 3) e;
-- WITH clause (Common Table Expressions)
WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders GROUP BY region
)
SELECT region FROM regional_sales WHERE total_sales > 1000000;
3. Correlated Subqueries
Basic Correlated Subquery
-- Find employees earning more than their department average
SELECT e1.name, e1.salary
FROM employees e1
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department_id = e1.department_id
);
Performance Considerations
-- Often better rewritten as JOIN
SELECT e1.name, e1.salary
FROM employees e1
JOIN (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees GROUP BY department_id
) dept_avg ON e1.department_id = dept_avg.department_id
WHERE e1.salary > dept_avg.avg_salary;
4. Subqueries in DML Operations
INSERT with Subquery
-- Copy active products to featured table
INSERT INTO featured_products (product_id, name)
SELECT id, name FROM products
WHERE active = 1 AND rating > 4.5;
UPDATE with Subquery
-- Update customer status based on purchases
UPDATE customers
SET premium = TRUE
WHERE id IN (
SELECT customer_id
FROM orders
GROUP BY customer_id
HAVING SUM(total) > 5000
);
5. Subquery Performance
-
Correlated subqueries: Often perform poorly -
consider rewriting as joins
-
EXISTS vs IN: EXISTS often faster for large
datasets
-
Materialized CTEs: Some databases optimize
WITH clause results
-
Indexing: Ensure join columns in subqueries
are indexed
Optimization Example
-- Original (slow correlated subquery)
SELECT * FROM products p
WHERE (SELECT COUNT(*) FROM order_items oi WHERE oi.product_id = p.id) > 10;
-- Optimized version
SELECT p.* FROM products p
JOIN (
SELECT product_id, COUNT(*) AS order_count
FROM order_items
GROUP BY product_id
HAVING COUNT(*) > 10
) popular ON p.id = popular.product_id;
Advanced SQL Subquery Insights
1. Subquery Execution Mechanics
-- How databases process different subquery types
EXPLAIN ANALYZE
SELECT * FROM products
WHERE category_id IN (SELECT id FROM categories WHERE premium = true);
-- PostgreSQL example showing subquery scan
Seq Scan on products (cost=0.00..145.00 rows=100 width=45)
Filter: (hashed SubPlan 1)
SubPlan 1
-> Seq Scan on categories (cost=0.00..22.75 rows=5 width=4)
Filter: (premium = true)
-
Materialization: Some databases cache
subquery results
-
Flattening: Optimizers may convert subqueries
to joins
-
Execution timing: Correlated subqueries run
once per outer row
2. Recursive CTEs
-- Hierarchical data processing (PostgreSQL/SQL Server)
WITH RECURSIVE org_chart AS (
-- Base case (anchor member)
SELECT id, name, manager_id, 1 AS level
FROM employees WHERE manager_id IS NULL
UNION ALL
-- Recursive member
SELECT e.id, e.name, e.manager_id, oc.level + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY level;
-- Oracle alternative
SELECT id, name, manager_id, LEVEL
FROM employees
START WITH manager_id IS NULL
CONNECT BY PRIOR id = manager_id;
-
Must use UNION ALL in recursive members
-
Termination condition is critical to prevent
infinite loops
-
Depth limits: Some databases restrict
recursion depth (e.g., 100 in SQL Server)
3. Advanced Subquery Factoring
-- Multiple CTEs with dependencies (Oracle/PostgreSQL)
WITH
sales_data AS (
SELECT product_id, SUM(amount) AS total_sales
FROM orders GROUP BY product_id
),
top_products AS (
SELECT product_id FROM sales_data
WHERE total_sales > (SELECT AVG(total_sales) FROM sales_data)
)
SELECT p.name, s.total_sales
FROM products p
JOIN sales_data s ON p.id = s.product_id
WHERE p.id IN (SELECT product_id FROM top_products);
-
Readability: CTEs make complex queries more
understandable
-
Reusability: Reference CTEs multiple times
-
Materialization hints: Some databases allow
controlling CTE materialization
4. Subquery Anti-Patterns
-- Problem: Nested subqueries with same filter
SELECT * FROM (
SELECT * FROM (
SELECT * FROM orders
WHERE status = 'completed'
) completed_orders
WHERE total > 100
) large_orders;
-- Better: Apply filters once
SELECT * FROM orders
WHERE status = 'completed' AND total > 100;
-- Problem: Correlated subquery in SELECT
SELECT p.name,
(SELECT COUNT(*) FROM orders o WHERE o.product_id = p.id) AS order_count
FROM products p;
-- Better: LEFT JOIN with GROUP BY
SELECT p.name, COUNT(o.id) AS order_count
FROM products p
LEFT JOIN orders o ON p.id = o.product_id
GROUP BY p.id, p.name;
-
Over-nesting: Deeply nested subqueries hurt
readability and performance
-
Redundant filtering: Apply filters at the
appropriate level
-
SELECT-list subqueries: Often better as joins
5. Modern Subquery Alternatives
-- LATERAL joins (PostgreSQL, Oracle, SQL Server)
SELECT d.name, latest.order_date
FROM departments d,
LATERAL (
SELECT order_date FROM orders
WHERE department_id = d.id
ORDER BY order_date DESC LIMIT 1
) latest;
-- JSON aggregation (avoiding multiple subqueries)
SELECT p.name,
(SELECT JSON_AGG(o) FROM (
SELECT id, order_date FROM orders
WHERE product_id = p.id LIMIT 3
) o) AS recent_orders
FROM products p;
-- Window functions replacing correlated subqueries
SELECT id, name, salary,
salary - AVG(salary) OVER (PARTITION BY department_id) AS diff_from_avg
FROM employees;
-
LATERAL joins: Powerful alternative for
correlated operations
-
JSON/XML aggregation: Combine related data
without multiple subqueries
-
Window functions: Often more efficient than
self-joins or correlated subqueries