Adithyan - Portfolio

Have a slow SELECT query? Add an index! They're the go-to solution for speeding up data retrieval, and for good reason. But like any powerful tool, they can be misused. The belief that "more is always better" can lead to a situation where your performance tuning actually slows your database down.

So, is creating more indexes for a table a problem? Yes, absolutely. Over-indexing is a real issue with significant performance costs. Let's break down the trade-off.

A Quick Refresher: What's an Index? 💡

Think of a database index like the index at the back of a textbook. Instead of reading the entire book page-by-page (a "full table scan") to find every mention of a specific topic, you can go to the index, find the topic, and get the exact page numbers you need.

In a database, an index does the same for your data. It's a special data structure that stores the values of a specific column (or columns) in a sorted order, with a pointer back to the original table row. This makes operations like WHERE, JOIN, and ORDER BY incredibly fast.

-- Without index: Full table scan
SELECT * FROM users WHERE email = 'john@example.com';

-- With index on email: Direct lookup
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'john@example.com';

The Real Cost: The Write Penalty ⚖️

Here's the catch: that helpful index doesn't maintain itself for free. While indexes make reading data (reads) faster, they make writing data (writes) slower.

Every time you perform a write operation on an indexed table, the database has more work to do:

INSERT Operations

When you add a new row to the table, the database must also add a new entry to every single index on that table to ensure they all stay up-to-date. If you have 10 indexes, you're not doing one write operation; you're doing 11 (1 for the table + 10 for the indexes).

-- Table with 5 indexes
INSERT INTO products (name, price, category, brand, created_at)
VALUES ('Laptop', 999.99, 'Electronics', 'TechBrand', NOW());

-- Behind the scenes: 6 write operations
-- 1. Insert into products table
-- 2. Update idx_products_name
-- 3. Update idx_products_price
-- 4. Update idx_products_category
-- 5. Update idx_products_brand
-- 6. Update idx_products_created_at

DELETE Operations

When you delete a row, the database must also find and delete the corresponding entry from every single index.

UPDATE Operations

This can be the costliest. If you update a non-indexed column, it's fine. But if you update a value in an indexed column, the database has to treat it like a DELETE followed by an INSERT within each affected index to maintain the correct sort order. This can cause significant overhead.

-- Updating an indexed column
UPDATE products 
SET price = 899.99 
WHERE id = 123;

-- If price is indexed, this requires:
-- 1. Remove old price entry from index
-- 2. Insert new price entry in correct position
-- 3. Update table row

Imagine our textbook analogy again. If you have just one index at the back, writing a new paragraph is simple. But if you also have an index of every proper noun, an index of every verb, and an index of every date mentioned, writing that same paragraph now requires you to update four different indexes. It's a lot more work! 🐌

The Other Hidden Costs

The write penalty is the main problem, but there are other costs to consider:

Storage Overhead

Indexes are not just metadata; they are data structures that take up disk space. On large tables, multiple indexes can consume gigabytes of storage, increasing costs for storage and backups.

-- Check index sizes in PostgreSQL
SELECT 
    indexname,
    pg_size_pretty(pg_relation_size(indexname::regclass)) as size
FROM pg_indexes 
WHERE tablename = 'large_table';

Maintenance Overhead

Indexes can become fragmented over time, requiring maintenance jobs (like REINDEX or OPTIMIZE TABLE) to keep them efficient. These operations consume CPU and I/O resources.

-- PostgreSQL maintenance
REINDEX INDEX idx_products_name;

-- MySQL maintenance
OPTIMIZE TABLE products;

Query Planner Complexity

With too many indexes to choose from, the database's query planner might occasionally get confused and pick a suboptimal index for a particular query, leading to unexpectedly poor performance.

Finding the "Goldilocks" Zone: An Indexing Strategy 🚀

The goal is not to avoid indexes, but to be intentional. We want the "just right" amount.

1. Analyze Your Workload

Is your application read-heavy or write-heavy?

Analytics platform with few writes can afford more indexes
High-throughput transaction processing system (like e-commerce checkout) that performs thousands of writes per second needs fewer indexes

-- Monitor read vs write patterns
SELECT 
    schemaname,
    tablename,
    seq_scan,     -- Sequential scans (reads)
    seq_tup_read, -- Tuples read sequentially
    idx_scan,     -- Index scans (reads)
    n_tup_ins,    -- Inserts (writes)
    n_tup_upd,    -- Updates (writes)
    n_tup_del     -- Deletes (writes)
FROM pg_stat_user_tables;

2. Index for Specific Queries

Don't just index columns because you think you might need to. Use your database's query analysis tools (like EXPLAIN or EXPLAIN ANALYZE in PostgreSQL) to identify slow queries and add indexes that directly support the WHERE, JOIN, or ORDER BY clauses in those specific queries.

-- Analyze query performance
EXPLAIN ANALYZE 
SELECT * FROM orders 
WHERE customer_id = 123 
AND status = 'pending' 
ORDER BY created_at DESC;

-- Create targeted composite index
CREATE INDEX idx_orders_customer_status_created 
ON orders(customer_id, status, created_at DESC);

3. Use Composite Indexes

Instead of creating separate indexes on column_A and column_B, consider if a single composite index on (column_A, column_B) would better serve your most common queries. This one index can often satisfy queries filtering on column_A alone, as well as queries filtering on both column_A and column_B.

-- Instead of two separate indexes:
-- CREATE INDEX idx_orders_customer ON orders(customer_id);
-- CREATE INDEX idx_orders_status ON orders(status);

-- Use one composite index:
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);

-- This index can serve:
-- WHERE customer_id = 123
-- WHERE customer_id = 123 AND status = 'pending'
-- But NOT: WHERE status = 'pending' (need customer_id first)

4. Monitor and Prune Unused Indexes

Most modern databases provide tools to identify indexes that are rarely or never used by the query planner. Regularly review these and drop the ones that aren't providing value. They are causing write overhead for zero read benefit.

-- Find unused indexes in PostgreSQL
SELECT 
    indexrelname as index_name,
    relname as table_name,
    idx_scan as index_scans,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes 
WHERE idx_scan = 0
ORDER BY relname, indexrelname;

-- Drop unused indexes
DROP INDEX idx_unused_column;

Performance Monitoring Tools

Database-Specific Tools

PostgreSQL:

-- Query performance
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements 
ORDER BY total_time DESC LIMIT 10;

-- Index usage statistics
SELECT * FROM pg_stat_user_indexes 
WHERE idx_scan < 10;

MySQL:

-- Slow query log analysis
SELECT * FROM mysql.slow_log 
ORDER BY start_time DESC LIMIT 10;

-- Index usage
SELECT * FROM information_schema.statistics 
WHERE table_name = 'your_table';

Best Practices Summary

| ✅ Do | ❌ Don't | |-----------|-------------| | Index frequently queried columns | Index every column "just in case" | | Use composite indexes for multi-column queries | Create redundant single-column indexes | | Monitor index usage regularly | Set and forget indexes | | Consider query patterns in design | Add indexes without analyzing queries | | Balance read vs write performance | Ignore write performance impact |

Real-World Example

-- E-commerce orders table
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    status VARCHAR(50) NOT NULL,
    total_amount DECIMAL(10,2),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Strategic indexing based on query patterns:

-- 1. Customer order history (frequent)
CREATE INDEX idx_orders_customer_created 
ON orders(customer_id, created_at DESC);

-- 2. Admin status filtering (frequent)
CREATE INDEX idx_orders_status_created 
ON orders(status, created_at DESC) 
WHERE status IN ('pending', 'processing');

-- 3. Amount-based queries (occasional)
CREATE INDEX idx_orders_amount 
ON orders(total_amount) 
WHERE total_amount > 1000;

-- Avoid: Indexing every column individually
-- ❌ CREATE INDEX idx_orders_updated ON orders(updated_at);
-- ❌ CREATE INDEX idx_orders_id ON orders(id); -- Already have PK

Final Thought

Treat database indexes like a high-performance specialty tool, not a blunt instrument. They are a classic example of a space-time trade-off. By adding an index, you are trading faster read times for slower write times and increased storage.

The key is to make that trade consciously and ensure the performance gain on your critical read queries is worth the penalty on your writes.

Remember: The best index is the one that serves your actual query patterns, not the one that looks good on paper.

Database performance is about making informed trade-offs, not following blanket rules.