Summary
Instead of staring at mega-diff walls, have an agent read the ticket and the diff, then hold a conversation with it about intent, impact, and risk. The agent surfaces things impacted by the change but not in the diff. This is how human code review scales to large PRs.
The Core Insight
The diff shows what text changed. It does not show what those changes mean, what they break downstream, or why they were made. An agent can bridge that gap by reading the ticket (intent), the diff (what changed), and the surrounding codebase (what’s affected). You then talk to the agent at a higher level than line-by-line review.
This is the same principle behind Monitor Generation from Diffs: use the diff as a seed to reason about system-wide impact beyond the changed lines. Monitor generation applies this post-merge for observability. Conversational review applies it pre-merge so the reviewer actually understands the PR before approving.
The Workflow
1. Agent reads the ticket/issue (understand intent)
2. Agent reads the diff (understand what changed)
3. Agent researches the codebase (understand impact)
- What calls the changed code?
- What assumptions does downstream code make?
- What's affected but NOT in the diff?
4. Reviewer has a conversation with the agent
- Old behavior vs new behavior
- Risk areas and edge cases
- What could break
The output is not a list of line comments. It is understanding.
Why This Scales
Large PRs (50+ files) from AI agents are becoming routine. No human can hold that much context by reading a flat diff. Two approaches exist:
-
Structural decomposition. Tools like flowdiff cluster related file changes into logical review units ordered by data flow. You read code in the order it executes, not alphabetical file order. This handles the “what changed” side.
-
Conversational review. An agent that has read the ticket and diff answers questions about intent and impact. This handles the “so what” side.
Both together give you structural clarity (flowdiff) and semantic understanding (agent conversation). Neither alone is sufficient for mega-diffs.
What the Agent Catches That Diffs Miss
- Code that depends on changed behavior but was not modified
- Implicit contracts (ordering, nullability, timing) that the change violates
- Test coverage gaps for the new behavior
- Configuration or environment assumptions that no longer hold
- Migration or deployment ordering requirements
These are the things that pass CI, get approved, and break production.
Relationship to Existing Patterns
| Pattern | Stage | Output | Shared Insight |
|---|---|---|---|
| Conversational Code Review | Pre-merge | Interactive understanding | Diff as seed for impact analysis |
| Monitor Generation from Diffs | Post-merge | Production monitors | Diff as seed for impact analysis |
| LLM Code Review in CI | Pre-merge | Automated PR comments | Diff as input (but no impact beyond diff) |
Source
Peter Levels on the Lex Fridman Podcast (2026). The insight: for large code reviews, have an agent read the ticket and the diff, then have a conversation with the agent about it at a higher level. The result is better than starting at mega-diff walls because the agent surfaces impact beyond the changed lines.

