SQL UNION


SQL
POSTGRES
INTERACTIVE

Complete SQL UNION & UNION ALL Technical Guide

1. Fundamental Syntax
-- Standard UNION (removes duplicates)
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;

-- UNION ALL (keeps duplicates)
SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2;
  1. Column Matching: All SELECTs must have same number of columns
  2. Data Type Compatibility: Corresponding columns must have compatible types
  3. Ordering: Final ORDER BY applies to combined result set
  4. Performance: UNION ALL is faster as it doesn't remove duplicates
2. Practical Use Cases
Combining Similar Data
-- Combine current and archived orders
SELECT order_id, customer_id, order_date, 'current' AS source
FROM current_orders
UNION ALL
SELECT order_id, customer_id, order_date, 'archive' AS source
FROM archived_orders
ORDER BY order_date DESC;
Creating Summary Reports
-- Combine data from multiple years
SELECT '2023' AS year, product_id, SUM(quantity) AS total_sales
FROM sales_2023
GROUP BY product_id
UNION ALL
SELECT '2022' AS year, product_id, SUM(quantity) AS total_sales
FROM sales_2022
GROUP BY product_id
ORDER BY year, total_sales DESC;
3. Advanced Techniques
UNION with Different Tables
-- Combine customers and employees for notification list
SELECT name, email, 'customer' AS type FROM customers
UNION
SELECT first_name + ' ' + last_name, email, 'employee' AS type FROM employees
WHERE active = 1
ORDER BY type, name;
UNION in Views and CTEs
-- Using UNION in a view
CREATE VIEW combined_inventory AS
SELECT product_id, quantity, 'warehouse' AS location FROM warehouse_inventory
UNION ALL
SELECT product_id, quantity, 'store' AS location FROM store_inventory;

-- Using UNION in a CTE
WITH combined_data AS (
    SELECT id, name FROM table1
    UNION ALL
    SELECT id, name FROM table2
)
SELECT * FROM combined_data WHERE id > 1000;
4. Performance Considerations
UNION vs UNION ALL
-- UNION (with duplicate removal)
SELECT product_id FROM recent_orders
UNION
SELECT product_id FROM old_orders;
-- Typically uses a SORT operation to remove duplicates

-- UNION ALL (no duplicate removal)
SELECT product_id FROM recent_orders
UNION ALL
SELECT product_id FROM old_orders;
-- Simply concatenates results without sorting
Optimizing Large Unions
-- Add filters to each SELECT to reduce data
SELECT id, name FROM large_table1 WHERE created_date > '2023-01-01'
UNION ALL
SELECT id, name FROM large_table2 WHERE created_date > '2023-01-01'

-- Consider temporary tables for complex unions
SELECT id, name INTO #temp_results FROM (
    SELECT id, name FROM table1 WHERE condition1
    UNION ALL
    SELECT id, name FROM table2 WHERE condition2
) AS combined;
5. Database-Specific Behaviors
  1. MySQL: Doesn't support INTERSECT/EXCEPT but can emulate with UNION + JOIN
  2. PostgreSQL: Supports UNION in recursive CTEs for hierarchical queries
  3. SQL Server: Allows TOP/LIMIT in individual UNION SELECTs (with parentheses)
  4. Oracle: Requires UNION subqueries to have matching data types more strictly
Recursive UNION Example
-- Hierarchical query (PostgreSQL)
WITH RECURSIVE org_hierarchy AS (
    -- Base case
    SELECT id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION
    
    -- Recursive case
    SELECT e.id, e.name, e.manager_id, h.level + 1
    FROM employees e
    JOIN org_hierarchy h ON e.manager_id = h.id
)
SELECT * FROM org_hierarchy ORDER BY level;

Advanced UNION & UNION ALL Insights

1. NULL Handling in Unions
-- NULL values are considered equal in UNION duplicate removal
SELECT 1 AS id, 'A' AS code UNION SELECT 1, NULL;
-- Returns 2 rows (1,'A') and (1,NULL)

SELECT 1 AS id, NULL AS code UNION SELECT 1, NULL;
-- Returns 1 row (1,NULL) - duplicates removed
  1. UNION considers NULLs equal for duplicate elimination
  2. UNION ALL preserves all NULL values exactly as they appear
  3. Some databases differ in NULL comparison behavior
2. Large Dataset Strategies
-- Chunking large unions (SQL Server example)
SELECT * INTO #results FROM (
    SELECT * FROM table1 WHERE id BETWEEN 1 AND 100000
    UNION ALL
    SELECT * FROM table1 WHERE id BETWEEN 100001 AND 200000
    UNION ALL
    -- Continue chunking...
) AS combined_data;

-- Using temp tables with indexes
SELECT col1, col2 INTO #temp1 FROM source1 WHERE condition;
CREATE INDEX idx_temp ON #temp1(col1);

SELECT col1, col2 INTO #temp2 FROM source2 WHERE condition;
CREATE INDEX idx_temp ON #temp2(col1);

SELECT * FROM #temp1 UNION ALL SELECT * FROM #temp2;
  1. Batch processing for memory management
  2. Temp table indexing improves performance
  3. Consider materialized views for frequent unions
3. UNION in DML Operations
-- INSERT with UNION
INSERT INTO target_table (col1, col2)
SELECT col1, col2 FROM source1
UNION ALL
SELECT col1, col2 FROM source2;

-- UPDATE from UNION result (SQL Server)
WITH combined AS (
    SELECT id, status FROM active_users
    UNION ALL
    SELECT id, status FROM inactive_users
)
UPDATE u
SET u.status = c.status
FROM users u
JOIN combined c ON u.id = c.id;

-- DELETE using UNION in subquery
DELETE FROM products
WHERE id IN (
    SELECT product_id FROM discontinued
    UNION
    SELECT product_id FROM recalled_items
);
  1. INSERT-SELECT-UNION patterns for data migration
  2. CTEs with UNION for complex DML logic
  3. Watch for transaction size with large operations
4. When NOT to Use UNION
-- Instead of UNION for conditional logic:
SELECT id, 
       CASE WHEN source = 'A' THEN amount ELSE 0 END AS amount_a,
       CASE WHEN source = 'B' THEN amount ELSE 0 END AS amount_b
FROM transactions;

-- Instead of UNION for same-table grouping:
SELECT 
    SUM(CASE WHEN type = 'SALE' THEN amount END) AS sales,
    SUM(CASE WHEN type = 'RETURN' THEN amount END) AS returns
FROM transactions;

-- Instead of UNION ALL for small fixed datasets:
SELECT * FROM (VALUES 
    (1, 'Apple'), 
    (2, 'Banana'),
    (3, 'Cherry')
) AS fruits(id, name);
  1. CASE expressions for conditional columns
  2. PIVOT operations for cross-tab reports
  3. VALUES clause for small static datasets
5. Debugging UNION Queries
-- Add debugging columns to identify source
SELECT 'source1' AS source, col1, col2 FROM table1
UNION ALL
SELECT 'source2' AS source, col1, col2 FROM table2;

-- Check individual query results before union
WITH q1 AS (SELECT col1, col2 FROM table1 WHERE ...),
     q2 AS (SELECT col1, col2 FROM table2 WHERE ...)
SELECT 'q1' AS query, COUNT(*) FROM q1
UNION ALL
SELECT 'q2' AS query, COUNT(*) FROM q2;

-- Validate column matching
EXCEPTION WHEN USING UNION:
"ERROR: each UNION query must have the same number of columns"
  1. Source tagging helps trace result origins
  2. Test components separately before combining
  3. Common errors: column count mismatch, type conflicts