SQL INDEXES
Complete SQL CREATE INDEX Technical Guide
1. Fundamental Syntax
CREATE [UNIQUE] [FULLTEXT] [SPATIAL] INDEX index_name ON table_name (column1 [ASC|DESC], column2, ...) [USING BTREE|HASH|GIST|SPGIST|GIN|BRIN] [WITH (storage_parameter = value)] [TABLESPACE tablespace_name] [WHERE predicate];
- Index Types: Regular, unique, full-text, spatial
- Sort Direction: ASC (default) or DESC for each column
- Access Methods: Algorithm-specific implementations (B-tree, Hash, etc.)
- Partial Indexes: WHERE clause for filtered indexes
2. Index Types Deep Dive
B-tree (Balanced Tree)
CREATE INDEX idx_customer_name ON customers(last_name, first_name);
Default for most databases. Optimal for range queries and equality comparisons.
Hash Index
CREATE INDEX idx_product_hash ON products USING HASH (product_code);
Only exact matches, no range scans. Faster than B-tree for equality but no sorting.
GIN (Generalized Inverted Index)
CREATE INDEX idx_document_content ON documents USING GIN (to_tsvector('english', content));
For composite values (arrays, JSON, full-text search).
BRIN (Block Range INdex)
CREATE INDEX idx_sensor_time ON sensor_data USING BRIN (reading_time);
For very large tables with naturally sorted data. Smaller but less precise.
3. Advanced Index Features
Covering Indexes
-- PostgreSQL CREATE INDEX idx_order_covering ON orders (customer_id) INCLUDE (order_date, total); -- SQL Server CREATE INDEX idx_order_covering ON orders(customer_id) INCLUDE (order_date, total);
Includes non-key columns to avoid table lookups.
Partial/Filtered Indexes
-- PostgreSQL/SQLite CREATE INDEX idx_active_users ON users(email) WHERE active = true; -- SQL Server CREATE INDEX idx_active_users ON users(email) WHERE active = 1;
Smaller indexes by filtering rows during creation.
Functional Indexes
CREATE INDEX idx_name_lower ON users(LOWER(last_name));
Indexes on expressions rather than raw columns.
4. Performance Considerations
Index Selection Strategy
- Prioritize columns in WHERE, JOIN, ORDER BY clauses
- Leftmost prefix rule for multi-column indexes
- Cardinality matters (high distinct values = better selectivity)
Maintenance Overhead
- INSERT/UPDATE/DELETE operations become slower
- Storage requirements increase (typically 5-20% of table size)
- Vacuum/Reindex needs for bloat control
5. Database-Specific Implementations
PostgreSQL
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
Non-blocking index creation with CONCURRENTLY option.
SQL Server
CREATE CLUSTERED INDEX idx_orders_date ON orders(order_date) WITH (DATA_COMPRESSION = PAGE, FILLFACTOR = 80);
Clustered indexes physically reorganize table data.
Oracle
CREATE BITMAP INDEX idx_products_category ON products(category_id) TABLESPACE indx_ts;
Bitmap indexes for low-cardinality columns in data warehouses.
6. Index Management
Rebuilding/Reorganizing
-- PostgreSQL REINDEX INDEX idx_orders_date; -- SQL Server ALTER INDEX idx_orders_date ON orders REBUILD; -- MySQL ANALYZE TABLE orders;
Usage Monitoring
-- PostgreSQL SELECT * FROM pg_stat_user_indexes; -- SQL Server SELECT * FROM sys.dm_db_index_usage_stats; -- MySQL SHOW INDEX FROM orders;
Dropping Indexes
DROP INDEX IF EXISTS idx_old_data;
Advanced CREATE INDEX Topics
1. Index-Only Scans
-- PostgreSQL example with covering index CREATE INDEX idx_orders_covering ON orders (customer_id, order_date) INCLUDE (total_amount); EXPLAIN ANALYZE SELECT customer_id, order_date, total_amount FROM orders WHERE customer_id BETWEEN 1000 AND 2000;
- Covering all columns: Queries avoid table lookups when all needed columns exist in the index
- Visibility map: PostgreSQL uses this to check if heap access is truly needed
- Performance benefit: Typically 2-10x faster than table scans
- Storage tradeoff: Larger indexes consume more space and memory
2. Composite Index Column Ordering
-- Optimal for queries filtering on customer_id and sorting by date CREATE INDEX idx_customer_orders ON orders (customer_id, order_date DESC); -- Versus less optimal ordering CREATE INDEX idx_orders_customer ON orders (order_date, customer_id);
- Leftmost prefix rule: Index (A,B,C) can be used for queries on A, (A,B), or (A,B,C)
- Cardinality principle: Place high-selectivity columns first
- Sorting priority: Columns used in ORDER BY should appear after equality filter columns
- Workload analysis: Optimize for your most frequent query patterns
3. Document Indexing (JSON/XML)
-- PostgreSQL JSONB indexing CREATE INDEX idx_product_attributes ON products USING GIN (attributes); -- SQL Server XML indexing CREATE PRIMARY XML INDEX idx_xml_content ON docs(xml_column); CREATE XML INDEX idx_xml_path ON docs(xml_column) USING XML INDEX idx_xml_content FOR PATH;
- GIN indexes: For JSONB array containment and existence queries
- Path-specific indexes: Optimize for particular JSON/XML paths
- Expression indexes: Extract specific fields for B-tree indexing
- Storage overhead: Document indexes can be larger than relational indexes
4. Full-Text Search Indexing
-- PostgreSQL tsvector indexing CREATE INDEX idx_document_search ON documents USING GIN (to_tsvector('english', content)); -- MySQL full-text index CREATE FULLTEXT INDEX idx_product_description ON products(description) WITH PARSER ngram; -- SQL Server full-text catalog CREATE FULLTEXT CATALOG ft_catalog AS DEFAULT; CREATE FULLTEXT INDEX ON products(name, description) KEY INDEX pk_products ON ft_catalog;
- Tokenization strategies: Language-specific word breaking and stemming
- Ranking algorithms: TF-IDF, BM25, and other relevance scoring
- Phrase search: Indexing configurations for exact phrase matching
- Synonym expansion: Thesaurus configurations for broader matching
5. Spatial/GIS Indexing
-- PostGIS R-tree index CREATE INDEX idx_geometries ON parcels USING GIST (geom); -- SQL Server spatial index CREATE SPATIAL INDEX idx_spatial_points ON locations(geography_point) USING GEOGRAPHY_AUTO_GRID WITH (BOUNDING_BOX = (-180, -90, 180, 90)); -- Oracle spatial index CREATE INDEX idx_spatial_data ON spatial_table(geometry) INDEXTYPE IS MDSYS.SPATIAL_INDEX;
- R-tree/GiST: For hierarchical spatial data organization
- Grid-based indexing: Partitioning space into grids for point data
- Coverage queries: Optimizing "contains" and "intersects" operations
- Distance searches: Indexing strategies for nearest-neighbor queries
6. Partitioned Table Indexing
-- PostgreSQL partitioned index CREATE INDEX idx_partitioned_orders ON orders (order_date) GLOBAL; -- Or LOCAL for per-partition indexes -- SQL Server partitioned index CREATE INDEX idx_sales_date ON sales(order_date) ON ps_monthly(order_date); -- Oracle local partitioned index CREATE INDEX idx_cust_orders_local ON orders(customer_id) LOCAL TABLESPACE idx_ts;
- Global vs local: Spanning all partitions vs per-partition indexes
- Partition pruning: How indexes interact with partition elimination
- Maintenance complexity: Rebuilding indexes on partitioned tables
- Unique constraints: Special requirements for partitioned uniqueness