The Problem No One Sees Until It's Too Late
Database performance rarely degrades all at once. It creeps up gradually: a query that took 80ms at launch takes 300ms after six months of data growth, then 900ms a year later. By the time users notice slowness, the problem is already severe.
We see this pattern repeatedly at MediaFront. The good news: most database performance issues stem from the same handful of root causes, and fixing them systematically delivers dramatic improvements without hardware upgrades.
Start With Measurement, Not Guesses
Every optimisation project begins the same way: instrument before you optimise. Guessing which query is slow wastes weeks. Your tools:
- PostgreSQL —
pg_stat_statementsextension tracks cumulative query execution time across all calls - MySQL — the
slow_query_logwithlong_query_time = 0.1captures everything above 100ms - SQL Server — Query Store provides a GUI over the same data
-- Find your 10 most expensive queries in PostgreSQL
SELECT
query,
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS avg_ms,
round(stddev_exec_time::numeric, 2) AS stddev_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
Fix the queries at the top of this list first. A 10× improvement on a query called 10,000 times per day is worth far more than a 100× improvement on one called 20 times.
Query Optimisation: The High-ROI Fixes
Always Run EXPLAIN ANALYZE
Before touching an index or schema, understand how the database executes the query:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
c.customer_id,
c.name,
COUNT(o.order_id) AS order_count,
SUM(o.total_amount) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.signup_date > '2024-01-01'
GROUP BY c.customer_id, c.name
ORDER BY total_spent DESC
LIMIT 100;
Red flags in the output:
Seq Scanon a large table — almost always an index opportunityHash Joinwith largeBatches— memory pressure, increasework_memor add an index- High
Buffers: shared read— data not in cache, consider a covering index
Stop Using SELECT *
This single habit kills performance in multiple ways: it fetches columns you'll never use, prevents covering-index optimisation, and bloats network payloads.
-- ❌ Fetches all 22 columns including large TEXT fields
SELECT * FROM products WHERE category_id = 5;
-- ✅ Fetches exactly what the UI renders
SELECT product_id, name, price, stock_qty
FROM products
WHERE category_id = 5;
Rewrite Correlated Subqueries as Joins
Correlated subqueries execute once per row of the outer query. On a table with 500,000 rows, a correlated subquery that takes 0.5ms runs for 250 seconds total.
-- ❌ Correlated subquery — O(n) database calls
SELECT name,
(SELECT MAX(order_date) FROM orders WHERE customer_id = c.id) AS last_order
FROM customers c;
-- ✅ Single-pass join
SELECT c.name, o.last_order
FROM customers c
LEFT JOIN (
SELECT customer_id, MAX(order_date) AS last_order
FROM orders
GROUP BY customer_id
) o ON c.id = o.customer_id;
Indexing Strategy
Indexes are the single highest-leverage tool in your optimisation arsenal—but only when used intentionally.
The Index Decision Framework
Create an index when:
- The column appears in a
WHERE,JOIN ON, orORDER BYclause - The column has high cardinality (many distinct values)
- The table has more than ~10,000 rows
- The read/write ratio for this table favours reads
Avoid indexes when:
- The table is write-heavy (each index slows every INSERT/UPDATE/DELETE)
- The column has very low cardinality (e.g., a boolean
is_active) - The query planner already uses a more selective index
Covering Indexes Eliminate Table Lookups
A covering index includes all columns a query needs, so PostgreSQL never touches the main table:
-- Query: find a customer's recent orders by status
SELECT order_id, order_date, total_amount
FROM orders
WHERE customer_id = 12345
AND status = 'pending'
ORDER BY order_date DESC;
-- Covering index — all queried columns live in the index
CREATE INDEX idx_orders_customer_status_date_amount
ON orders (customer_id, status, order_date DESC)
INCLUDE (total_amount);
The INCLUDE clause (PostgreSQL 11+) adds total_amount to leaf pages without affecting sort order, making this a true zero-heap-access query.
Partial Indexes for Selective Conditions
When you only ever query a subset of rows, index only that subset:
-- Only active, unfulfilled orders matter for the operations dashboard
CREATE INDEX idx_orders_unfulfilled
ON orders (customer_id, created_at)
WHERE status IN ('pending', 'processing');
This index is a fraction of the size of a full index on those columns, making it faster to build, cheaper to maintain, and more likely to stay in memory.
Prune Dead Indexes
Unused indexes are pure cost: they consume disk, slow writes, and confuse the planner. Find them:
SELECT
schemaname,
tablename,
indexname,
idx_scan AS scans_since_restart
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey'
ORDER BY schemaname, tablename;
Drop any index that shows zero scans and wasn't created in the last 24 hours.
Caching: The Force Multiplier
Once queries are fast, caching keeps frequently-read data out of the database entirely.
Application-Level Caching with Redis
public async Task<Product> GetProductAsync(int id)
{
var cacheKey = $"product:{id}";
var cached = await _cache.GetStringAsync(cacheKey);
if (cached is not null)
return JsonSerializer.Deserialize<Product>(cached)!;
var product = await _db.Products
.AsNoTracking()
.Include(p => p.Category)
.FirstOrDefaultAsync(p => p.Id == id);
if (product is not null)
{
await _cache.SetStringAsync(
cacheKey,
JsonSerializer.Serialize(product),
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(15)
});
}
return product!;
}
Cache TTL is the hardest decision. Our rule of thumb: start at half the expected staleness tolerance, then tune based on cache-hit ratio monitoring.
Materialised Views for Analytics
Complex analytical queries over large tables should never run live against OLTP data. Materialise the result:
CREATE MATERIALIZED VIEW mv_monthly_revenue AS
SELECT
DATE_TRUNC('month', order_date) AS month,
product_category,
SUM(amount) AS revenue,
COUNT(*) AS orders
FROM orders o
JOIN products p ON o.product_id = p.id
GROUP BY 1, 2;
CREATE INDEX ON mv_monthly_revenue (month);
-- Refresh nightly (or on-demand after large data loads)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_monthly_revenue;
CONCURRENTLY lets reads continue during the refresh — zero downtime.
Real-World Results: E-Commerce Platform Optimisation
An e-commerce client came to us handling 50,000+ daily orders with page timeouts during flash sales. Our four-week audit and remediation:
| Action | Impact |
|---|---|
| Rewrote 15 highest-cost queries | −60% DB CPU usage |
| Added 8 strategic covering indexes | Avg query time: 1200ms → 85ms |
| Removed 5 unused indexes | Write throughput +18% |
| Added Redis for product catalogue | Cache hit rate 94% |
Partitioned orders table by month | Flash sale timeouts eliminated |
The business outcome: a 24% increase in conversion rate—driven entirely by improved responsiveness. No new hardware was purchased.
Schema Design: Getting It Right Upstream
Optimising a poorly designed schema is fighting uphill. Principles that pay off long-term:
- Normalise transactional tables — eliminating update anomalies is worth the occasional join
- Denormalise for reporting — create dedicated summary tables or materialised views rather than forcing complex joins into every report query
- Partition large tables — partition
ordersandeventstables by date range so old data never slows current queries - Use appropriate data types —
BIGINTfor IDs that will exceed 2 billion,TIMESTAMPTZfor timestamps (always store UTC),NUMERIC(10,2)for money (neverFLOAT)
Database performance is a product of schema design, query patterns, and infrastructure working together. Fix all three, and you'll handle 10× the traffic on the same hardware.