SQL JOIN


SQL
POSTGRES
INTERACTIVE

Complete SQL JOIN Technical Reference

1. Core JOIN Fundamentals

JOINs combine data from multiple tables based on logical relationships. The major types include:

  1. INNER JOIN: Returns only matching rows from both tables
  2. LEFT JOIN: All rows from left table + matched rows from right
  3. RIGHT JOIN: All rows from right table + matched rows from left
  4. FULL JOIN: All rows from both tables (union of left and right)
  5. CROSS JOIN: Cartesian product of all rows
Example:
SELECT * FROM employees e 
INNER JOIN departments d 
ON e.dept_id = d.id
2. JOIN Algorithm Internals
Nested Loops

For each row in outer table, scan inner table. Best for small datasets or indexed joins.

Complexity: O(M*N)

Hash Join

Builds hash table from smaller table, probes with larger table. Ideal for unsorted data.

Complexity: O(M+N) in memory

Merge Join

Simultaneous scan of sorted inputs. Requires pre-sorted data or indexes.

Complexity: O(M+N)

3. Advanced JOIN Patterns
Self Joins
SELECT e.name, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id
Lateral Joins
SELECT d.name, e.name
FROM departments d,
LATERAL (SELECT * FROM employees 
         WHERE dept_id = d.id 
         ORDER BY salary DESC 
         LIMIT 3) e
4. JOIN Optimization
Indexing Strategies
  1. Create indexes on all join keys
  2. Covering indexes to avoid table lookups
  3. Composite indexes matching join order
Execution Hints
  1. MySQL: /*+ JOIN_ORDER(t1, t2) */
  2. SQL Server: OPTION (FORCE ORDER)
  3. Oracle: /*+ LEADING(t1) USE_NL(t2) */
5. Distributed JOINs
Shuffle Join

Redistributes both tables across nodes by join key

Broadcast Join

Replicates small table to all nodes containing large table

Partitioned Join

Joins pre-partitioned tables without reshuffling

6. Specialized JOIN Types
ASOF Joins
-- Find most recent price before trade
SELECT t.trade_time, p.price
FROM trades t
ASOF JOIN prices p ON t.stock_id = p.stock_id 
  AND t.trade_time >= p.price_time
Range Joins
-- Find employees active during project
SELECT e.name, p.name
FROM employees e
JOIN projects p ON e.hire_date <= p.end_date
  AND (e.termination_date >= p.start_date 
       OR e.termination_date IS NULL)
⚠️ JOIN Performance Pitfalls
  1. Cartesian products from missing/incorrect ON clauses
  2. Data type mismatches causing implicit conversions
  3. Over-joining when subqueries would suffice
  4. NULL handling in join conditions (NULL ≠ NULL)
  5. Skewed data causing uneven workload distribution
Database-Specific JOIN Implementations
MySQL
  1. Block Nested Loop as fallback algorithm
  2. join_buffer_size controls memory for joins
  3. Limited to one join algorithm per query
PostgreSQL
  1. Advanced Bitmap Heap Scan for multi-index joins
  2. Parallel hash joins for large datasets
  3. Supports all ANSI join types including LATERAL
Advanced Optimization
  1. Join Order Selection: How query planners determine optimal table join order
  2. Join Elimination: When optimizers remove unnecessary joins (PK/FK relationships)
  3. Predicate Pushdown: Pushing filter conditions into join operations
  4. Bloom Filters: Probabilistic filters for hash joins in distributed systems
  5. Join Collocation: In distributed DBs when joined tables share distribution keys
Specialized Joins
  1. ASOF Joins: For temporal data (nearest timestamp matches)
  2. Range Joins: Matching values within ranges (date intervals, number ranges)
  3. Semijoins (WHERE EXISTS): Optimized for "included in" relationships
  4. Antijoins (WHERE NOT EXISTS): Optimized for "not in" relationships
  5. Recursive Joins: For hierarchical data (org charts, bill of materials)
Distributed Joins
  1. Shuffle Joins: Redistuting data across nodes for joining
  2. Broadcast Joins: Replicating small tables to all nodes
  3. Partitioned Joins: When both tables share partitioning scheme
  4. Skew Handling: Techniques for uneven data distribution
  5. Approximate Joins: Probabilistic data structures for big data
Performance Internals
  1. Memory Management: How hash joins handle memory spills to disk
  2. Parallel Joins: Multi-threaded join execution strategies
  3. Vectorized Joins: SIMD-optimized implementations
  4. Join Statistics: How cardinality estimates affect plans
  5. Join Hint Systems: Database-specific hint syntaxes
DB-Specific Features
  1. Oracle: Star transformations for data warehouses
  2. SQL Server: Adaptive join operators
  3. PostgreSQL: TID scans for join optimization
  4. MySQL: Block nested-loop variations
  5. Snowflake/BigQuery: Cluster-aware joins
Practical Aspects
  1. Join Cardinality: Estimating result set sizes
  2. NULL Handling: Three-valued logic implications
  3. Join vs Subquery: When to prefer each approach
  4. Materialized Join Views: Pre-computing expensive joins
  5. Join Patterns for ETL: Slowly changing dimensions