SQL SUBQUERIES


SQL
POSTGRES
INTERACTIVE

Complete SQL Subqueries Technical Guide

1. Fundamental Subquery Types
-- Scalar subquery (returns single value)
SELECT name, (SELECT MAX(price) FROM products) AS max_price
FROM categories;

-- Row subquery (returns single row)
SELECT * FROM employees
WHERE (department, salary) = 
  (SELECT department, MAX(salary) FROM employees GROUP BY department LIMIT 1);

-- Table subquery (returns result set)
SELECT * FROM 
  (SELECT product_id, SUM(quantity) AS total FROM orders GROUP BY product_id) AS product_totals
WHERE total > 100;
  1. Scalar: Returns exactly one value (one row, one column)
  2. Row: Returns one row with multiple columns
  3. Table: Returns full result set (used in FROM clause)
  4. Correlated: References outer query (executed repeatedly)
2. Subquery Operators
Comparison Operators
-- IN operator
SELECT * FROM products
WHERE category_id IN (SELECT id FROM categories WHERE active = 1);

-- EXISTS operator
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.total > 1000);

-- Comparison with ALL/ANY
SELECT * FROM employees
WHERE salary > ALL (SELECT salary FROM interns);
Specialized Forms
-- LATERAL joins (PostgreSQL)
SELECT d.name, e.name
FROM departments d,
LATERAL (SELECT name FROM employees WHERE department_id = d.id ORDER BY salary DESC LIMIT 3) e;

-- WITH clause (Common Table Expressions)
WITH regional_sales AS (
  SELECT region, SUM(amount) AS total_sales
  FROM orders GROUP BY region
)
SELECT region FROM regional_sales WHERE total_sales > 1000000;
3. Correlated Subqueries
Basic Correlated Subquery
-- Find employees earning more than their department average
SELECT e1.name, e1.salary
FROM employees e1
WHERE salary > (
  SELECT AVG(salary) 
  FROM employees e2 
  WHERE e2.department_id = e1.department_id
);
Performance Considerations
-- Often better rewritten as JOIN
SELECT e1.name, e1.salary
FROM employees e1
JOIN (
  SELECT department_id, AVG(salary) AS avg_salary
  FROM employees GROUP BY department_id
) dept_avg ON e1.department_id = dept_avg.department_id
WHERE e1.salary > dept_avg.avg_salary;
4. Subqueries in DML Operations
INSERT with Subquery
-- Copy active products to featured table
INSERT INTO featured_products (product_id, name)
SELECT id, name FROM products
WHERE active = 1 AND rating > 4.5;
UPDATE with Subquery
-- Update customer status based on purchases
UPDATE customers
SET premium = TRUE
WHERE id IN (
  SELECT customer_id 
  FROM orders 
  GROUP BY customer_id 
  HAVING SUM(total) > 5000
);
5. Subquery Performance
  1. Correlated subqueries: Often perform poorly - consider rewriting as joins
  2. EXISTS vs IN: EXISTS often faster for large datasets
  3. Materialized CTEs: Some databases optimize WITH clause results
  4. Indexing: Ensure join columns in subqueries are indexed
Optimization Example
-- Original (slow correlated subquery)
SELECT * FROM products p
WHERE (SELECT COUNT(*) FROM order_items oi WHERE oi.product_id = p.id) > 10;

-- Optimized version
SELECT p.* FROM products p
JOIN (
  SELECT product_id, COUNT(*) AS order_count
  FROM order_items
  GROUP BY product_id
  HAVING COUNT(*) > 10
) popular ON p.id = popular.product_id;

Advanced SQL Subquery Insights

1. Subquery Execution Mechanics
-- How databases process different subquery types
EXPLAIN ANALYZE
SELECT * FROM products
WHERE category_id IN (SELECT id FROM categories WHERE premium = true);

-- PostgreSQL example showing subquery scan
Seq Scan on products  (cost=0.00..145.00 rows=100 width=45)
  Filter: (hashed SubPlan 1)
  SubPlan 1
    -> Seq Scan on categories  (cost=0.00..22.75 rows=5 width=4)
         Filter: (premium = true)
  1. Materialization: Some databases cache subquery results
  2. Flattening: Optimizers may convert subqueries to joins
  3. Execution timing: Correlated subqueries run once per outer row
2. Recursive CTEs
-- Hierarchical data processing (PostgreSQL/SQL Server)
WITH RECURSIVE org_chart AS (
    -- Base case (anchor member)
    SELECT id, name, manager_id, 1 AS level
    FROM employees WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive member
    SELECT e.id, e.name, e.manager_id, oc.level + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY level;

-- Oracle alternative
SELECT id, name, manager_id, LEVEL
FROM employees
START WITH manager_id IS NULL
CONNECT BY PRIOR id = manager_id;
  1. Must use UNION ALL in recursive members
  2. Termination condition is critical to prevent infinite loops
  3. Depth limits: Some databases restrict recursion depth (e.g., 100 in SQL Server)
3. Advanced Subquery Factoring
-- Multiple CTEs with dependencies (Oracle/PostgreSQL)
WITH 
  sales_data AS (
    SELECT product_id, SUM(amount) AS total_sales
    FROM orders GROUP BY product_id
  ),
  top_products AS (
    SELECT product_id FROM sales_data 
    WHERE total_sales > (SELECT AVG(total_sales) FROM sales_data)
  )
SELECT p.name, s.total_sales
FROM products p
JOIN sales_data s ON p.id = s.product_id
WHERE p.id IN (SELECT product_id FROM top_products);
  1. Readability: CTEs make complex queries more understandable
  2. Reusability: Reference CTEs multiple times
  3. Materialization hints: Some databases allow controlling CTE materialization
4. Subquery Anti-Patterns
-- Problem: Nested subqueries with same filter
SELECT * FROM (
  SELECT * FROM (
    SELECT * FROM orders
    WHERE status = 'completed'
  ) completed_orders
  WHERE total > 100
) large_orders;

-- Better: Apply filters once
SELECT * FROM orders
WHERE status = 'completed' AND total > 100;

-- Problem: Correlated subquery in SELECT
SELECT p.name, 
       (SELECT COUNT(*) FROM orders o WHERE o.product_id = p.id) AS order_count
FROM products p;

-- Better: LEFT JOIN with GROUP BY
SELECT p.name, COUNT(o.id) AS order_count
FROM products p
LEFT JOIN orders o ON p.id = o.product_id
GROUP BY p.id, p.name;
  1. Over-nesting: Deeply nested subqueries hurt readability and performance
  2. Redundant filtering: Apply filters at the appropriate level
  3. SELECT-list subqueries: Often better as joins
5. Modern Subquery Alternatives
-- LATERAL joins (PostgreSQL, Oracle, SQL Server)
SELECT d.name, latest.order_date
FROM departments d,
LATERAL (
  SELECT order_date FROM orders
  WHERE department_id = d.id
  ORDER BY order_date DESC LIMIT 1
) latest;

-- JSON aggregation (avoiding multiple subqueries)
SELECT p.name,
  (SELECT JSON_AGG(o) FROM (
    SELECT id, order_date FROM orders 
    WHERE product_id = p.id LIMIT 3
  ) o) AS recent_orders
FROM products p;

-- Window functions replacing correlated subqueries
SELECT id, name, salary,
  salary - AVG(salary) OVER (PARTITION BY department_id) AS diff_from_avg
FROM employees;
  1. LATERAL joins: Powerful alternative for correlated operations
  2. JSON/XML aggregation: Combine related data without multiple subqueries
  3. Window functions: Often more efficient than self-joins or correlated subqueries