When agent throughput exceeds human review capacity, corrections become cheap and waiting becomes expensive. The merge strategy that was responsible at low throughput becomes the bottleneck at high throughput.
Author: James Phoenix | Date: March 2026
The Traditional Model
In human-driven development, blocking merge gates are a net positive. A pull request sits in a queue. A reviewer reads the diff, leaves comments, requests changes. The author addresses them. The reviewer approves. The code merges.
This works because the cost of waiting is low relative to the cost of merging a bad change. Human engineers produce maybe 1-2 PRs per day. Review capacity roughly matches creation rate. The queue stays short. A PR that waits a few hours loses almost nothing. A bad merge that breaks production costs a lot.
The entire process assumes a world where code creation is the bottleneck and review is cheap relative to fixing post-merge problems. That assumption held for decades.
What Changes at High Throughput
OpenAI’s Codex team, building an internal product with zero manually-written code, observed a throughput of 3.5 pull requests per engineer per day. As the team grew from 3 to 7 engineers, per-engineer throughput actually increased (the opposite of the mythical man-month). After five months, the repository contained roughly one million lines of code across ~1,500 merged PRs.
At that rate, the economics of merge gates invert.
A PR that blocks for four hours waiting on review is not “being careful.” It is blocking the next three PRs that depend on it. Each blocked PR stalls the agent that produced it and any downstream agents that need the merged state. The cost compounds. Four hours of blocking at 3.5 PRs/day/engineer across a seven-person team means dozens of blocked work items per day.
Meanwhile, the cost of merging a flawed change and correcting it afterward dropped. Agents can produce correction PRs quickly. The fix cycle (detect problem, generate fix, merge fix) takes less time than the review cycle (wait for reviewer, address comments, wait for re-review, merge). When correction is faster than prevention, the optimal strategy shifts toward merging fast and fixing forward.
The Inversion
The traditional merge philosophy optimizes for a world where:
- Creation rate is low (1-2 PRs/day/engineer)
- Review capacity roughly matches creation rate
- Post-merge fixes are expensive (human must understand, diagnose, fix)
- Waiting cost is low (nothing else is blocked)
The agent-driven merge philosophy optimizes for a world where:
- Creation rate is high (3-5+ PRs/day/engineer)
- Review capacity is far below creation rate
- Post-merge fixes are cheap (agent generates correction quickly)
- Waiting cost is high (downstream work is blocked)
| Factor | Traditional | Agent-Driven |
|---|---|---|
| PR creation rate | 1-2/day/engineer | 3-5+/day/engineer |
| Review bottleneck | Rarely | Constantly |
| Cost of waiting | Low | High (compounds) |
| Cost of correction | High (human diagnosis) | Low (agent fix-forward) |
| Optimal strategy | Block until reviewed | Merge fast, correct fast |
This is not about being reckless. It is about recognizing that the cost function changed.
What This Looks Like in Practice
The Codex team adopted minimal blocking merge gates. Concretely:
Short-lived PRs. PRs were kept small and merged quickly rather than accumulating into large review-heavy changesets. Smaller PRs are easier to verify mechanically and cheaper to revert if something breaks.
Mechanical verification over human review. Custom linters, structural tests, and CI pipelines replaced much of what code review traditionally catches. The linters were written to produce error messages formatted for agent consumption, so when a constraint was violated, the agent could self-correct without human involvement.
Test flakes handled with follow-up runs. Rather than blocking a merge on an intermittent test failure and pulling a human in to investigate, the team re-ran failing tests. If a test failed consistently, it was a real problem. If it passed on retry, the merge proceeded and a follow-up task was created.
Corrections treated as normal flow. A merged PR that introduced a minor issue was not a failure of the review process. It was a normal part of the development loop. The next agent run would detect and fix it. The total time from introduction to fix was shorter than the time a traditional review cycle would have taken.
When This Does Not Apply
This inversion only holds when specific conditions are met. Without them, blocking merge gates remain the better default.
You need strong mechanical verification. If your codebase lacks good types, lints, and tests, merging fast means merging garbage. The Codex team invested heavily in custom linters with agent-readable error messages, structural tests that enforced architectural invariants, and a full local observability stack. The mechanical verification replaced human review, it did not simply skip it.
You need fast correction cycles. If fixing a bad merge takes longer than preventing it, the inversion does not hold. This means the agent must be able to detect issues (via tests, lints, or observability) and produce fixes quickly.
You need isolation between work streams. Merging fast with parallel agents requires that a bad merge in one work stream does not cascade into others. Git worktree isolation, feature flags, and modular architecture all help here.
You need revertability. Fast merging only works if you can revert fast. Small, atomic PRs are easier to revert than large ones. If your merge process squashes everything into one commit, reversion is cheap. If it does not, you need to think carefully about this.
The Deeper Point
The merge philosophy is not a cultural preference. It is an optimization problem. You are minimizing total cost, which is the sum of waiting cost and correction cost. In a low-throughput world, correction cost dominates, so you invest in prevention (reviews). In a high-throughput world, waiting cost dominates, so you invest in detection and correction (mechanical verification + fast fix-forward).
Most engineering teams treat their merge process as an identity rather than a policy. “We do thorough code review” is a statement of values, not an engineering decision. When the cost function changes, the optimal policy changes with it, regardless of how it feels.
The teams that will struggle most with this transition are the ones that conflate process discipline with specific processes. Discipline means optimizing for the constraints you actually face. It does not mean running the same playbook when the game changed.
The Practical Takeaway
If your team is producing more PRs than you can review:
- Invest in mechanical verification (lints, types, tests, CI) until you trust your automated gates
- Shrink PR size so each merge is low-risk and easy to revert
- Reduce blocking gates to the minimum that your mechanical verification supports
- Treat post-merge corrections as normal workflow, not failures
- Track queue depth and wait time as signals that your process needs adjustment
The goal is not to stop reviewing code. The goal is to stop blocking on review when the cost of blocking exceeds the cost of correcting.
Related
- Building the Harness – The four-layer harness that makes mechanical verification possible
- Six-Layer Lint Harness – Real-world mechanical verification that replaces review for structural correctness
- Git Worktrees for Parallel Dev – Isolation between parallel agent work streams
- Agent-Driven Development – The broader workflow where merge philosophy matters
- The Human Bottleneck Is a Quality Mechanism – The counterargument: human slowness serves a purpose
- CI/CD Patterns for AI Agents – GitHub Actions patterns for agent verification gates
References
- OpenAI Engineering: Harness Engineering – Source for Codex team throughput data and merge philosophy

