SQL INDEXES


SQL
POSTGRES
INTERACTIVE

Complete SQL CREATE INDEX Technical Guide

1. Fundamental Syntax
CREATE [UNIQUE] [FULLTEXT] [SPATIAL] INDEX index_name
ON table_name (column1 [ASC|DESC], column2, ...)
[USING BTREE|HASH|GIST|SPGIST|GIN|BRIN]
[WITH (storage_parameter = value)]
[TABLESPACE tablespace_name]
[WHERE predicate];
  1. Index Types: Regular, unique, full-text, spatial
  2. Sort Direction: ASC (default) or DESC for each column
  3. Access Methods: Algorithm-specific implementations (B-tree, Hash, etc.)
  4. Partial Indexes: WHERE clause for filtered indexes
2. Index Types Deep Dive
B-tree (Balanced Tree)
CREATE INDEX idx_customer_name ON customers(last_name, first_name);

Default for most databases. Optimal for range queries and equality comparisons.

Hash Index
CREATE INDEX idx_product_hash ON products USING HASH (product_code);

Only exact matches, no range scans. Faster than B-tree for equality but no sorting.

GIN (Generalized Inverted Index)
CREATE INDEX idx_document_content ON documents USING GIN (to_tsvector('english', content));

For composite values (arrays, JSON, full-text search).

BRIN (Block Range INdex)
CREATE INDEX idx_sensor_time ON sensor_data USING BRIN (reading_time);

For very large tables with naturally sorted data. Smaller but less precise.

3. Advanced Index Features
Covering Indexes
-- PostgreSQL
CREATE INDEX idx_order_covering ON orders (customer_id) INCLUDE (order_date, total);

-- SQL Server
CREATE INDEX idx_order_covering ON orders(customer_id) INCLUDE (order_date, total);

Includes non-key columns to avoid table lookups.

Partial/Filtered Indexes
-- PostgreSQL/SQLite
CREATE INDEX idx_active_users ON users(email) WHERE active = true;

-- SQL Server
CREATE INDEX idx_active_users ON users(email) WHERE active = 1;

Smaller indexes by filtering rows during creation.

Functional Indexes
CREATE INDEX idx_name_lower ON users(LOWER(last_name));

Indexes on expressions rather than raw columns.

4. Performance Considerations
Index Selection Strategy
  1. Prioritize columns in WHERE, JOIN, ORDER BY clauses
  2. Leftmost prefix rule for multi-column indexes
  3. Cardinality matters (high distinct values = better selectivity)
Maintenance Overhead
  1. INSERT/UPDATE/DELETE operations become slower
  2. Storage requirements increase (typically 5-20% of table size)
  3. Vacuum/Reindex needs for bloat control
5. Database-Specific Implementations
PostgreSQL
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

Non-blocking index creation with CONCURRENTLY option.

SQL Server
CREATE CLUSTERED INDEX idx_orders_date ON orders(order_date)
WITH (DATA_COMPRESSION = PAGE, FILLFACTOR = 80);

Clustered indexes physically reorganize table data.

Oracle
CREATE BITMAP INDEX idx_products_category ON products(category_id)
TABLESPACE indx_ts;

Bitmap indexes for low-cardinality columns in data warehouses.

6. Index Management
Rebuilding/Reorganizing
-- PostgreSQL
REINDEX INDEX idx_orders_date;

-- SQL Server
ALTER INDEX idx_orders_date ON orders REBUILD;

-- MySQL
ANALYZE TABLE orders;
Usage Monitoring
-- PostgreSQL
SELECT * FROM pg_stat_user_indexes;

-- SQL Server
SELECT * FROM sys.dm_db_index_usage_stats;

-- MySQL
SHOW INDEX FROM orders;
Dropping Indexes
DROP INDEX IF EXISTS idx_old_data;

Advanced CREATE INDEX Topics

1. Index-Only Scans
-- PostgreSQL example with covering index
CREATE INDEX idx_orders_covering ON orders (customer_id, order_date) 
INCLUDE (total_amount);

EXPLAIN ANALYZE 
SELECT customer_id, order_date, total_amount 
FROM orders 
WHERE customer_id BETWEEN 1000 AND 2000;
  1. Covering all columns: Queries avoid table lookups when all needed columns exist in the index
  2. Visibility map: PostgreSQL uses this to check if heap access is truly needed
  3. Performance benefit: Typically 2-10x faster than table scans
  4. Storage tradeoff: Larger indexes consume more space and memory
2. Composite Index Column Ordering
-- Optimal for queries filtering on customer_id and sorting by date
CREATE INDEX idx_customer_orders ON orders (customer_id, order_date DESC);

-- Versus less optimal ordering
CREATE INDEX idx_orders_customer ON orders (order_date, customer_id);
  1. Leftmost prefix rule: Index (A,B,C) can be used for queries on A, (A,B), or (A,B,C)
  2. Cardinality principle: Place high-selectivity columns first
  3. Sorting priority: Columns used in ORDER BY should appear after equality filter columns
  4. Workload analysis: Optimize for your most frequent query patterns
3. Document Indexing (JSON/XML)
-- PostgreSQL JSONB indexing
CREATE INDEX idx_product_attributes ON products USING GIN (attributes);

-- SQL Server XML indexing
CREATE PRIMARY XML INDEX idx_xml_content ON docs(xml_column);
CREATE XML INDEX idx_xml_path ON docs(xml_column)
USING XML INDEX idx_xml_content FOR PATH;
  1. GIN indexes: For JSONB array containment and existence queries
  2. Path-specific indexes: Optimize for particular JSON/XML paths
  3. Expression indexes: Extract specific fields for B-tree indexing
  4. Storage overhead: Document indexes can be larger than relational indexes
4. Full-Text Search Indexing
-- PostgreSQL tsvector indexing
CREATE INDEX idx_document_search ON documents 
USING GIN (to_tsvector('english', content));

-- MySQL full-text index
CREATE FULLTEXT INDEX idx_product_description ON products(description)
WITH PARSER ngram;

-- SQL Server full-text catalog
CREATE FULLTEXT CATALOG ft_catalog AS DEFAULT;
CREATE FULLTEXT INDEX ON products(name, description) 
KEY INDEX pk_products ON ft_catalog;
  1. Tokenization strategies: Language-specific word breaking and stemming
  2. Ranking algorithms: TF-IDF, BM25, and other relevance scoring
  3. Phrase search: Indexing configurations for exact phrase matching
  4. Synonym expansion: Thesaurus configurations for broader matching
5. Spatial/GIS Indexing
-- PostGIS R-tree index
CREATE INDEX idx_geometries ON parcels USING GIST (geom);

-- SQL Server spatial index
CREATE SPATIAL INDEX idx_spatial_points ON locations(geography_point)
USING GEOGRAPHY_AUTO_GRID
WITH (BOUNDING_BOX = (-180, -90, 180, 90));

-- Oracle spatial index
CREATE INDEX idx_spatial_data ON spatial_table(geometry)
INDEXTYPE IS MDSYS.SPATIAL_INDEX;
  1. R-tree/GiST: For hierarchical spatial data organization
  2. Grid-based indexing: Partitioning space into grids for point data
  3. Coverage queries: Optimizing "contains" and "intersects" operations
  4. Distance searches: Indexing strategies for nearest-neighbor queries
6. Partitioned Table Indexing
-- PostgreSQL partitioned index
CREATE INDEX idx_partitioned_orders ON orders (order_date)
GLOBAL;  -- Or LOCAL for per-partition indexes

-- SQL Server partitioned index
CREATE INDEX idx_sales_date ON sales(order_date)
ON ps_monthly(order_date);

-- Oracle local partitioned index
CREATE INDEX idx_cust_orders_local ON orders(customer_id)
LOCAL TABLESPACE idx_ts;
  1. Global vs local: Spanning all partitions vs per-partition indexes
  2. Partition pruning: How indexes interact with partition elimination
  3. Maintenance complexity: Rebuilding indexes on partitioned tables
  4. Unique constraints: Special requirements for partitioned uniqueness