SQL UNION

Complete SQL UNION & UNION ALL Technical Guide

1. Fundamental Syntax

-- Standard UNION (removes duplicates)
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;

-- UNION ALL (keeps duplicates)
SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2;

Column Matching: All SELECTs must have same number of columns
Data Type Compatibility: Corresponding columns must have compatible types
Ordering: Final ORDER BY applies to combined result set
Performance: UNION ALL is faster as it doesn't remove duplicates

2. Practical Use Cases

Combining Similar Data

-- Combine current and archived orders
SELECT order_id, customer_id, order_date, 'current' AS source
FROM current_orders
UNION ALL
SELECT order_id, customer_id, order_date, 'archive' AS source
FROM archived_orders
ORDER BY order_date DESC;

Creating Summary Reports

-- Combine data from multiple years
SELECT '2023' AS year, product_id, SUM(quantity) AS total_sales
FROM sales_2023
GROUP BY product_id
UNION ALL
SELECT '2022' AS year, product_id, SUM(quantity) AS total_sales
FROM sales_2022
GROUP BY product_id
ORDER BY year, total_sales DESC;

3. Advanced Techniques

UNION with Different Tables

-- Combine customers and employees for notification list
SELECT name, email, 'customer' AS type FROM customers
UNION
SELECT first_name + ' ' + last_name, email, 'employee' AS type FROM employees
WHERE active = 1
ORDER BY type, name;

UNION in Views and CTEs

-- Using UNION in a view
CREATE VIEW combined_inventory AS
SELECT product_id, quantity, 'warehouse' AS location FROM warehouse_inventory
UNION ALL
SELECT product_id, quantity, 'store' AS location FROM store_inventory;

-- Using UNION in a CTE
WITH combined_data AS (
    SELECT id, name FROM table1
    UNION ALL
    SELECT id, name FROM table2
)
SELECT * FROM combined_data WHERE id > 1000;

4. Performance Considerations

UNION vs UNION ALL

-- UNION (with duplicate removal)
SELECT product_id FROM recent_orders
UNION
SELECT product_id FROM old_orders;
-- Typically uses a SORT operation to remove duplicates

-- UNION ALL (no duplicate removal)
SELECT product_id FROM recent_orders
UNION ALL
SELECT product_id FROM old_orders;
-- Simply concatenates results without sorting

Optimizing Large Unions

-- Add filters to each SELECT to reduce data
SELECT id, name FROM large_table1 WHERE created_date > '2023-01-01'
UNION ALL
SELECT id, name FROM large_table2 WHERE created_date > '2023-01-01'

-- Consider temporary tables for complex unions
SELECT id, name INTO #temp_results FROM (
    SELECT id, name FROM table1 WHERE condition1
    UNION ALL
    SELECT id, name FROM table2 WHERE condition2
) AS combined;

5. Database-Specific Behaviors

MySQL: Doesn't support INTERSECT/EXCEPT but can emulate with UNION + JOIN
PostgreSQL: Supports UNION in recursive CTEs for hierarchical queries
SQL Server: Allows TOP/LIMIT in individual UNION SELECTs (with parentheses)
Oracle: Requires UNION subqueries to have matching data types more strictly

Recursive UNION Example

-- Hierarchical query (PostgreSQL)
WITH RECURSIVE org_hierarchy AS (
    -- Base case
    SELECT id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION
    
    -- Recursive case
    SELECT e.id, e.name, e.manager_id, h.level + 1
    FROM employees e
    JOIN org_hierarchy h ON e.manager_id = h.id
)
SELECT * FROM org_hierarchy ORDER BY level;

Advanced UNION & UNION ALL Insights

1. NULL Handling in Unions

-- NULL values are considered equal in UNION duplicate removal
SELECT 1 AS id, 'A' AS code UNION SELECT 1, NULL;
-- Returns 2 rows (1,'A') and (1,NULL)

SELECT 1 AS id, NULL AS code UNION SELECT 1, NULL;
-- Returns 1 row (1,NULL) - duplicates removed

UNION considers NULLs equal for duplicate elimination
UNION ALL preserves all NULL values exactly as they appear
Some databases differ in NULL comparison behavior

2. Large Dataset Strategies

-- Chunking large unions (SQL Server example)
SELECT * INTO #results FROM (
    SELECT * FROM table1 WHERE id BETWEEN 1 AND 100000
    UNION ALL
    SELECT * FROM table1 WHERE id BETWEEN 100001 AND 200000
    UNION ALL
    -- Continue chunking...
) AS combined_data;

-- Using temp tables with indexes
SELECT col1, col2 INTO #temp1 FROM source1 WHERE condition;
CREATE INDEX idx_temp ON #temp1(col1);

SELECT col1, col2 INTO #temp2 FROM source2 WHERE condition;
CREATE INDEX idx_temp ON #temp2(col1);

SELECT * FROM #temp1 UNION ALL SELECT * FROM #temp2;

Batch processing for memory management
Temp table indexing improves performance
Consider materialized views for frequent unions

3. UNION in DML Operations

-- INSERT with UNION
INSERT INTO target_table (col1, col2)
SELECT col1, col2 FROM source1
UNION ALL
SELECT col1, col2 FROM source2;

-- UPDATE from UNION result (SQL Server)
WITH combined AS (
    SELECT id, status FROM active_users
    UNION ALL
    SELECT id, status FROM inactive_users
)
UPDATE u
SET u.status = c.status
FROM users u
JOIN combined c ON u.id = c.id;

-- DELETE using UNION in subquery
DELETE FROM products
WHERE id IN (
    SELECT product_id FROM discontinued
    UNION
    SELECT product_id FROM recalled_items
);

INSERT-SELECT-UNION patterns for data migration
CTEs with UNION for complex DML logic
Watch for transaction size with large operations

4. When NOT to Use UNION

-- Instead of UNION for conditional logic:
SELECT id, 
       CASE WHEN source = 'A' THEN amount ELSE 0 END AS amount_a,
       CASE WHEN source = 'B' THEN amount ELSE 0 END AS amount_b
FROM transactions;

-- Instead of UNION for same-table grouping:
SELECT 
    SUM(CASE WHEN type = 'SALE' THEN amount END) AS sales,
    SUM(CASE WHEN type = 'RETURN' THEN amount END) AS returns
FROM transactions;

-- Instead of UNION ALL for small fixed datasets:
SELECT * FROM (VALUES 
    (1, 'Apple'), 
    (2, 'Banana'),
    (3, 'Cherry')
) AS fruits(id, name);

CASE expressions for conditional columns
PIVOT operations for cross-tab reports
VALUES clause for small static datasets

5. Debugging UNION Queries

-- Add debugging columns to identify source
SELECT 'source1' AS source, col1, col2 FROM table1
UNION ALL
SELECT 'source2' AS source, col1, col2 FROM table2;

-- Check individual query results before union
WITH q1 AS (SELECT col1, col2 FROM table1 WHERE ...),
     q2 AS (SELECT col1, col2 FROM table2 WHERE ...)
SELECT 'q1' AS query, COUNT(*) FROM q1
UNION ALL
SELECT 'q2' AS query, COUNT(*) FROM q2;

-- Validate column matching
EXCEPTION WHEN USING UNION:
"ERROR: each UNION query must have the same number of columns"

Source tagging helps trace result origins
Test components separately before combining
Common errors: column count mismatch, type conflicts