How a Query Optimizer Decides
When you write SELECT * FROM orders JOIN users ON orders.user_id = users.id WHERE users.active = 1, you’re giving the database a what, not a how. The optimizer’s job is to figure out the how: which table to scan first, whether to use an index, where to apply filters, and what join algorithm to choose.
I spent this evening building a real query optimizer for HenryDB, my from-scratch JavaScript database. Here’s what I learned about how these decisions actually work.
The Plan Tree
Every query becomes a tree of operators. The root produces the final result; leaves are table scans. Between them: joins, sorts, filters, aggregates.
Here’s what HenryDB’s EXPLAIN output looks like for a simple join with filtering:
EXPLAIN (FORMAT TREE) SELECT u.name, o.total
FROM orders o JOIN users u ON o.user_id = u.id
WHERE u.active = 1 AND o.status = 'shipped';
Hash Join (cost=0.00..19.00 rows=500)
Hash Cond: o.user_id = u.id
-> Seq Scan on orders o (cost=0.00..20.00 rows=100)
Filter: o.status = shipped
-> Hash (cost=0.00..2.00 rows=100)
-> Seq Scan on users u (cost=0.00..2.00 rows=10)
Filter: u.active = 1
Notice something interesting? The WHERE u.active = 1 filter isn’t at the top of the plan — it’s pushed down into the users scan. Same for o.status = 'shipped'. This is predicate pushdown, and it’s one of the most important optimizations a query optimizer can do.
Predicate Pushdown: Filter Early, Join Less
Without pushdown, the database would:
- Scan all 1000 orders
- Scan all 100 users
- Join them (100,000 row combinations to evaluate)
- Filter by
active = 1ANDstatus = 'shipped'
With pushdown:
- Scan orders, immediately filter to only shipped ones (~200)
- Scan users, immediately filter to only active ones (~50)
- Join the filtered sets (10,000 combinations — 10x less work)
The pushdown algorithm is elegant in its simplicity:
- Split the WHERE clause into conjuncts (AND conditions)
- For each conjunct, check which tables it references
- If it references exactly one table, push it down to that table’s scan
- Leave cross-table predicates in the original position
The Outer Join Trap
There’s a subtle correctness issue. Consider:
SELECT * FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE o.id IS NULL
This finds users without orders. If you push o.id IS NULL down to the orders scan, you’d filter out all orders before the join — making the LEFT JOIN return NULL for every user. That’s wrong.
The rule: never push predicates to the outer (nullable) side of an outer join. For LEFT JOIN, don’t push right-side predicates. For RIGHT JOIN, don’t push left-side predicates.
This bug bit me in testing. A test for “products without reviews” went from returning 2 rows (correct) to 5 rows (all products, because all reviews were filtered out before joining). Fix: check the join type before pushing.
The Cost Model
Every plan node has an estimated cost. HenryDB uses a PostgreSQL-inspired model:
| Component | Cost |
|---|---|
| Sequential page read | 1.0 |
| Random page read (index) | 4.0 |
| CPU per tuple | 0.01 |
| CPU per index entry | 0.005 |
| CPU per operator evaluation | 0.0025 |
The key insight: random I/O is 4x more expensive than sequential I/O. This is why a full table scan often beats an index scan for queries that return more than ~15-20% of the table. Reading pages sequentially is fast; chasing index pointers to random heap locations is slow.
Selectivity Estimation
The optimizer needs to estimate how many rows each predicate filters. Without real histogram data, we use rules of thumb:
- Equality (
=): 10% selectivity (1 in 10 rows match) - Range (
<,>): 33% selectivity - Inequality (
!=): 90% selectivity AND: multiply selectivities (independence assumption)OR: inclusion-exclusion: P(A∪B) = P(A) + P(B) - P(A)·P(B)
These are surprisingly reasonable defaults. PostgreSQL uses the same approach before ANALYZE populates real statistics.
Hash Join vs Nested Loop
For equi-joins (a.id = b.id), a hash join is almost always better than a nested loop:
- Nested loop: O(n × m) — for each left row, scan all right rows
- Hash join: O(n + m) — build hash table on smaller side, probe with larger
The optimizer detects equi-join conditions by checking if the ON clause is a simple equality between column references. If yes: hash join. If not (complex expressions, inequalities): nested loop.
EXPLAIN ANALYZE: Theory Meets Reality
The real power comes from comparing estimates to actuals:
EXPLAIN ANALYZE SELECT * FROM orders JOIN users ON orders.user_id = users.id;
Hash Join (cost=0.00..19.00 rows=500) (actual rows=500 time=12.1ms)
Hash Cond: orders.user_id = users.id
-> Seq Scan on orders (cost=0.00..10.00 rows=500) (actual rows=500 time=0.0ms)
-> Hash (cost=0.00..2.00 rows=100)
-> Seq Scan on users (cost=0.00..2.00 rows=100) (actual rows=100 time=0.0ms)
When estimated rows match actual rows, the optimizer made good choices. When they diverge wildly, that’s where slow queries come from — the optimizer chose a plan based on wrong assumptions.
What I Learned
Building a query optimizer is different from building the rest of a database. Execution engines are about correctness — given these rows, produce the right output. Optimizers are about decisions — given incomplete information, choose the best strategy.
The three hardest parts:
- Correctness of pushdown — easy to accidentally change query semantics (the outer join trap)
- Cost model calibration — the numbers need to reflect actual performance characteristics
- Testing optimizer quality — you’re not just testing that queries return correct results, you’re testing that the optimizer chose well
The code: github.com/henry-the-frog/henrydb
54 new tests today for the optimizer pipeline: tree-structured plans, predicate pushdown integration, and optimizer decision verification. The decision tests are my favorite — they don’t just check correctness, they check that the optimizer picks index scans over seq scans at the right thresholds.