Back to Blog
5 min read

PostgreSQL Performance for Developers — Indexes, EXPLAIN, and Common Mistakes

Practical PostgreSQL optimization techniques I've learned building production systems — from index strategy to query analysis to connection pooling.

PostgreSQL Performance for Developers — Indexes, EXPLAIN, and Common Mistakes

Working on platforms with 70,000+ users taught me that most performance problems aren't application bugs — they're database problems. Here are the techniques I use to diagnose and fix them.

Start With EXPLAIN ANALYZE

Before touching any index, understand what the query planner is actually doing:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.reference, o.status, c.name as company_name
FROM orders o
JOIN companies c ON c.id = o.company_id
WHERE o.status = 'PENDING'
  AND o.company_id = '550e8400-e29b-41d4-a716-446655440000'
ORDER BY o.created_at DESC
LIMIT 20;

The output tells you:

  • Seq Scan — reading the entire table. Bad on large tables.
  • Index Scan — using an index. Good.
  • Hash Join vs Nested Loop — the join strategy. Nested loop is better when one side is small.
  • Rows — estimated vs actual. Large discrepancy = stale statistics, run ANALYZE.
  • Buffers — how much data was read from disk vs cache. High disk reads = needs caching or better indexes.

Index Strategy

The Most Valuable Index: (foreign_key, status)

Most queries filter by both a foreign key and a status column. A single-column index on each is almost never optimal. Composite index:

-- Before: two separate indexes (inefficient)
CREATE INDEX idx_orders_company ON orders(company_id);
CREATE INDEX idx_orders_status ON orders(status);
 
-- After: one composite index (covers both filters)
CREATE INDEX idx_orders_company_status ON orders(company_id, status);
 
-- Even better if you always sort by created_at
CREATE INDEX idx_orders_company_status_date 
ON orders(company_id, status, created_at DESC);

Column order in composite indexes matters: put equality conditions first, range/sort conditions last.

Partial Indexes

If 95% of your queries target status = 'ACTIVE' rows:

-- Index only active records — much smaller, faster
CREATE INDEX idx_users_active ON users(email) WHERE status = 'ACTIVE';

This is dramatically more efficient than a full index when inactive records are the majority.

GIN Indexes for Arrays and JSONB

If you store skills or tags as an array:

-- Without index: Seq Scan on every query
SELECT * FROM jobs WHERE 'React' = ANY(skills);
 
-- With GIN index: instant
CREATE INDEX idx_jobs_skills ON jobs USING GIN(skills);

For JSONB columns (common when migrating from MongoDB):

CREATE INDEX idx_profiles_meta ON profiles USING GIN(metadata jsonb_path_ops);

Connection Pooling Is Not Optional

PostgreSQL's default behavior: one connection = one process. At scale, opening a new connection per request will bring your database to its knees.

Use PgBouncer as a connection pooler:

# pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp
 
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

With pool_mode = transaction, a database connection is only held during a transaction. 1000 concurrent app connections can share 25 real Postgres connections. This is how platforms with millions of users run on small databases.

With Prisma, point your DATABASE_URL at PgBouncer and add ?pgbouncer=true:

DATABASE_URL="postgresql://user:pass@pgbouncer:6432/myapp?pgbouncer=true&connection_limit=1"

N+1 Queries: The Silent Killer

The most common performance bug in ORM-heavy code:

// BAD — 1 query for orders + N queries for companies
const orders = await prisma.order.findMany({ where: { status: 'PENDING' } })
for (const order of orders) {
  const company = await prisma.company.findUnique({ where: { id: order.companyId } })
  // ...
}
 
// GOOD — 1 query with JOIN
const orders = await prisma.order.findMany({
  where: { status: 'PENDING' },
  include: { company: true },
})

Always check your query logs in development. I add this to every Node.js project:

const prisma = new PrismaClient({
  log: process.env.NODE_ENV === 'development' 
    ? ['query', 'warn', 'error'] 
    : ['warn', 'error'],
})

Vacuuming and Table Bloat

PostgreSQL uses MVCC — deleted rows aren't immediately removed, they become "dead tuples". Autovacuum handles this, but high-write tables can accumulate bloat.

Check table bloat:

SELECT 
  schemaname,
  tablename,
  n_dead_tup,
  n_live_tup,
  round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

If dead_pct is above 20%, tune autovacuum or run VACUUM ANALYZE tablename manually.

Quick Wins Checklist

  • Run EXPLAIN ANALYZE before adding any index
  • Add composite indexes for (fk, status) pairs you filter often
  • Use PgBouncer in transaction mode in production
  • Log slow queries: log_min_duration_statement = 500 in postgresql.conf
  • Set work_mem appropriately for sort-heavy queries (default 4MB is often too low)
  • Run ANALYZE after bulk inserts
  • Use RETURNING to avoid extra SELECT after INSERT/UPDATE

The database is the foundation. Neglecting it means hitting invisible walls as you scale.