SQL INDEXES

Complete SQL CREATE INDEX Technical Guide

1. Fundamental Syntax

CREATE [UNIQUE] [FULLTEXT] [SPATIAL] INDEX index_name
ON table_name (column1 [ASC|DESC], column2, ...)
[USING BTREE|HASH|GIST|SPGIST|GIN|BRIN]
[WITH (storage_parameter = value)]
[TABLESPACE tablespace_name]
[WHERE predicate];

Index Types: Regular, unique, full-text, spatial
Sort Direction: ASC (default) or DESC for each column
Access Methods: Algorithm-specific implementations (B-tree, Hash, etc.)
Partial Indexes: WHERE clause for filtered indexes

2. Index Types Deep Dive

B-tree (Balanced Tree)

CREATE INDEX idx_customer_name ON customers(last_name, first_name);

Default for most databases. Optimal for range queries and equality comparisons.

Hash Index

CREATE INDEX idx_product_hash ON products USING HASH (product_code);

Only exact matches, no range scans. Faster than B-tree for equality but no sorting.

GIN (Generalized Inverted Index)

CREATE INDEX idx_document_content ON documents USING GIN (to_tsvector('english', content));

For composite values (arrays, JSON, full-text search).

BRIN (Block Range INdex)

CREATE INDEX idx_sensor_time ON sensor_data USING BRIN (reading_time);

For very large tables with naturally sorted data. Smaller but less precise.

3. Advanced Index Features

Covering Indexes

-- PostgreSQL
CREATE INDEX idx_order_covering ON orders (customer_id) INCLUDE (order_date, total);

-- SQL Server
CREATE INDEX idx_order_covering ON orders(customer_id) INCLUDE (order_date, total);

Includes non-key columns to avoid table lookups.

Partial/Filtered Indexes

-- PostgreSQL/SQLite
CREATE INDEX idx_active_users ON users(email) WHERE active = true;

-- SQL Server
CREATE INDEX idx_active_users ON users(email) WHERE active = 1;

Smaller indexes by filtering rows during creation.

Functional Indexes

CREATE INDEX idx_name_lower ON users(LOWER(last_name));

Indexes on expressions rather than raw columns.

4. Performance Considerations

Index Selection Strategy

Prioritize columns in WHERE, JOIN, ORDER BY clauses
Leftmost prefix rule for multi-column indexes
Cardinality matters (high distinct values = better selectivity)

Maintenance Overhead

INSERT/UPDATE/DELETE operations become slower
Storage requirements increase (typically 5-20% of table size)
Vacuum/Reindex needs for bloat control

5. Database-Specific Implementations

PostgreSQL

CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

Non-blocking index creation with CONCURRENTLY option.

SQL Server

CREATE CLUSTERED INDEX idx_orders_date ON orders(order_date)
WITH (DATA_COMPRESSION = PAGE, FILLFACTOR = 80);

Clustered indexes physically reorganize table data.

Oracle

CREATE BITMAP INDEX idx_products_category ON products(category_id)
TABLESPACE indx_ts;

Bitmap indexes for low-cardinality columns in data warehouses.

6. Index Management

Rebuilding/Reorganizing

-- PostgreSQL
REINDEX INDEX idx_orders_date;

-- SQL Server
ALTER INDEX idx_orders_date ON orders REBUILD;

-- MySQL
ANALYZE TABLE orders;

Usage Monitoring

-- PostgreSQL
SELECT * FROM pg_stat_user_indexes;

-- SQL Server
SELECT * FROM sys.dm_db_index_usage_stats;

-- MySQL
SHOW INDEX FROM orders;

Dropping Indexes

DROP INDEX IF EXISTS idx_old_data;

Advanced CREATE INDEX Topics

1. Index-Only Scans

-- PostgreSQL example with covering index
CREATE INDEX idx_orders_covering ON orders (customer_id, order_date) 
INCLUDE (total_amount);

EXPLAIN ANALYZE 
SELECT customer_id, order_date, total_amount 
FROM orders 
WHERE customer_id BETWEEN 1000 AND 2000;

Covering all columns: Queries avoid table lookups when all needed columns exist in the index
Visibility map: PostgreSQL uses this to check if heap access is truly needed
Performance benefit: Typically 2-10x faster than table scans
Storage tradeoff: Larger indexes consume more space and memory

2. Composite Index Column Ordering

-- Optimal for queries filtering on customer_id and sorting by date
CREATE INDEX idx_customer_orders ON orders (customer_id, order_date DESC);

-- Versus less optimal ordering
CREATE INDEX idx_orders_customer ON orders (order_date, customer_id);

Leftmost prefix rule: Index (A,B,C) can be used for queries on A, (A,B), or (A,B,C)
Cardinality principle: Place high-selectivity columns first
Sorting priority: Columns used in ORDER BY should appear after equality filter columns
Workload analysis: Optimize for your most frequent query patterns

3. Document Indexing (JSON/XML)

-- PostgreSQL JSONB indexing
CREATE INDEX idx_product_attributes ON products USING GIN (attributes);

-- SQL Server XML indexing
CREATE PRIMARY XML INDEX idx_xml_content ON docs(xml_column);
CREATE XML INDEX idx_xml_path ON docs(xml_column)
USING XML INDEX idx_xml_content FOR PATH;

GIN indexes: For JSONB array containment and existence queries
Path-specific indexes: Optimize for particular JSON/XML paths
Expression indexes: Extract specific fields for B-tree indexing
Storage overhead: Document indexes can be larger than relational indexes

4. Full-Text Search Indexing

-- PostgreSQL tsvector indexing
CREATE INDEX idx_document_search ON documents 
USING GIN (to_tsvector('english', content));

-- MySQL full-text index
CREATE FULLTEXT INDEX idx_product_description ON products(description)
WITH PARSER ngram;

-- SQL Server full-text catalog
CREATE FULLTEXT CATALOG ft_catalog AS DEFAULT;
CREATE FULLTEXT INDEX ON products(name, description) 
KEY INDEX pk_products ON ft_catalog;

Tokenization strategies: Language-specific word breaking and stemming
Ranking algorithms: TF-IDF, BM25, and other relevance scoring
Phrase search: Indexing configurations for exact phrase matching
Synonym expansion: Thesaurus configurations for broader matching

5. Spatial/GIS Indexing

-- PostGIS R-tree index
CREATE INDEX idx_geometries ON parcels USING GIST (geom);

-- SQL Server spatial index
CREATE SPATIAL INDEX idx_spatial_points ON locations(geography_point)
USING GEOGRAPHY_AUTO_GRID
WITH (BOUNDING_BOX = (-180, -90, 180, 90));

-- Oracle spatial index
CREATE INDEX idx_spatial_data ON spatial_table(geometry)
INDEXTYPE IS MDSYS.SPATIAL_INDEX;

R-tree/GiST: For hierarchical spatial data organization
Grid-based indexing: Partitioning space into grids for point data
Coverage queries: Optimizing "contains" and "intersects" operations
Distance searches: Indexing strategies for nearest-neighbor queries

6. Partitioned Table Indexing

-- PostgreSQL partitioned index
CREATE INDEX idx_partitioned_orders ON orders (order_date)
GLOBAL;  -- Or LOCAL for per-partition indexes

-- SQL Server partitioned index
CREATE INDEX idx_sales_date ON sales(order_date)
ON ps_monthly(order_date);

-- Oracle local partitioned index
CREATE INDEX idx_cust_orders_local ON orders(customer_id)
LOCAL TABLESPACE idx_ts;

Global vs local: Spanning all partitions vs per-partition indexes
Partition pruning: How indexes interact with partition elimination
Maintenance complexity: Rebuilding indexes on partitioned tables
Unique constraints: Special requirements for partitioned uniqueness