Summary
The actor-critic pattern separates generation from evaluation. One agent (the actor/writer) produces output, while another (the critic/reviewer) evaluates and improves it. This separation creates higher-quality results than single-pass generation. The pattern applies to code, documentation, design docs, and any content where iterative refinement adds value.
The Core Pattern
┌────────────────────┐
│ ACTOR (Writer) │
│ - Generates │
│ - Creates quickly │
│ - Optimistic │
└────────┬───────────┘
│
▼
[Draft output]
│
▼
┌────────────────────┐
│ CRITIC (Reviewer) │
│ - Evaluates │
│ - Finds issues │
│ - Skeptical │
└────────┬───────────┘
│
├── Issues? → Actor revises → Critic re-reviews
│
└── No issues → Done
The actor focuses on completing the task. The critic focuses on finding problems. Neither role is “more important.” Together they produce better results than either could alone.
Why Separation Works
Single-pass generation has a fundamental conflict: the same mind that creates content cannot objectively evaluate it. The author sees what they intended to write, not what they actually wrote.
Separating roles fixes this:
- The actor writes freely without self-censoring or over-engineering
- The critic reviews objectively without ownership bias
- Multiple passes catch issues that fresh eyes reveal
This mirrors effective human workflows. Writers benefit from editors. Developers benefit from code review. The pattern encodes this wisdom into AI systems.
Two Implementation Approaches
Approach 1: Single Agent, Alternating Roles
Same LLM switches between actor and critic personas:
// Round 1: Write
const draft = await llm.generate(`
You are a technical writer creating a PRD.
Write a PRD for: ${feature}
`);
// Round 2: Review
const critique = await llm.generate(`
You are a senior product manager reviewing a PRD.
Find gaps, unclear requirements, and missing edge cases.
PRD to review:
${draft}
`);
// Round 3: Revise
const revised = await llm.generate(`
You are improving a PRD based on feedback.
Original PRD:
${draft}
Feedback:
${critique}
Revise to address all issues.
`);
Advantages: Simpler, cheaper (one model), maintains context between rounds.
Disadvantages: Same model biases persist, limited adversarial tension.
Approach 2: Separate Agents
Distinct agent instances with specialized configurations:
const writer = new Agent({
role: 'technical-writer',
persona: 'Write clear, concise documentation. Follow style guides.',
tools: ['Read', 'Glob', 'Write'],
});
const reviewer = new Agent({
role: 'documentation-reviewer',
persona: `Review for clarity, accuracy, and completeness.
Be critical. Find every gap and inconsistency.
Only approve if genuinely ready for publication.`,
tools: ['Read', 'Grep'], // Read-only
});
// Execute loop
let doc = await writer.create(task);
let approved = false;
while (!approved && rounds < maxRounds) {
const review = await reviewer.evaluate(doc);
if (review.verdict === 'APPROVED') {
approved = true;
} else {
doc = await writer.revise(doc, review.feedback);
}
rounds++;
}
Advantages: True separation of concerns, can use different models (cheaper critic), stronger adversarial tension.
Disadvantages: More complex setup, context not shared automatically.
When to Use Each Approach
| Scenario | Recommended Approach |
|---|---|
| Quick drafts, low stakes | Single agent |
| Security-critical code | Separate agents |
| Documentation, PRDs | Either works |
| High-visibility content | Separate agents |
| Tight budget | Single agent |
| Need audit trail | Separate agents |
Application Beyond Code
The actor-critic pattern is not limited to code review. It applies wherever quality matters:
Documentation
Actor (Writer): Creates technical documentation following style guides.
Critic (Editor): Checks for clarity, accuracy, completeness, and jargon.
## Critique Dimensions for Documentation
1. **Clarity**: Can a newcomer understand this?
2. **Accuracy**: Do code examples work? Are facts correct?
3. **Completeness**: Are edge cases covered? Missing sections?
4. **Consistency**: Does terminology match the codebase?
5. **Actionability**: Can readers follow the instructions?
PRDs and Design Docs
Actor (Author): Writes requirements and design proposals.
Critic (Stakeholder): Challenges assumptions, finds gaps, questions feasibility.
## Critique Dimensions for PRDs
1. **Requirements completeness**: All user needs captured?
2. **Edge cases**: What happens when things go wrong?
3. **Dependencies**: Are external dependencies identified?
4. **Measurability**: How will success be measured?
5. **Feasibility**: Can engineering build this as specified?
Book Chapters
The RALPH loop uses actor-critic for book writing:
Actor (Chapter Writer): Writes first draft following PRD and sources.
Critic (Reviewer Agents): Multiple specialized reviewers check different dimensions.
## Review Agents for Book Chapters
- **slop-checker**: Finds AI-text tells (delve, crucial, moreover)
- **tech-accuracy**: Validates code examples and tool references
- **term-intro-checker**: Ensures acronyms are defined
- **oreilly-style**: Applies publishing conventions
Stopping Criteria
The loop needs exit conditions:
- Approval: Critic finds no significant issues
- Max rounds: Typically 3-5 rounds (diminishing returns)
- Improvement stall: Less than 10% issue reduction between rounds
- Escalation: Hand off to human when stuck
function shouldStop(round: number, issues: number, prevIssues: number): boolean {
if (issues === 0) return true; // Approved
if (round >= 5) return true; // Max rounds
const improvement = (prevIssues - issues) / prevIssues;
if (improvement < 0.1) return true; // Stalled
return false;
}
Cost vs Quality Trade-offs
More rounds cost more but catch more issues:
| Rounds | Typical Cost | Issues Caught |
|---|---|---|
| 1 | $0.05-0.15 | 0% (no review) |
| 2 | $0.10-0.30 | 60-70% |
| 3 | $0.15-0.45 | 80-85% |
| 4 | $0.20-0.60 | 90-95% |
| 5 | $0.25-0.75 | 95%+ |
For high-stakes content, 3-5 rounds are worth the cost. For drafts and internal docs, 1-2 rounds suffice.
Common Pitfalls
1. Critic Too Harsh
Problem: Critic never approves, creates infinite loops.
Fix: Set maximum rounds. Accept “good enough.” Have critic distinguish critical vs minor issues.
2. Critic Too Lenient
Problem: Critic approves everything, provides no value.
Fix: Use adversarial prompting. Require specific issue counts. Set quality thresholds.
3. Lost Context Between Rounds
Problem: Actor forgets original requirements when revising.
Fix: Include original task in every revision prompt. Use session-based agents.
4. Over-Engineering
Problem: Each round adds complexity without improving quality.
Fix: Critic should enforce simplicity. “Only fix real issues, don’t add features.”
Integration with RALPH
The RALPH loop uses actor-critic at multiple levels:
- Micro-level: Within each task, code/content is generated then reviewed
- Macro-level: Every 6 iterations, review agents run across all output
- Meta-level: Progress summarizer evaluates overall quality metrics
This multi-scale application of actor-critic creates compounding quality.
Related
- Actor-Critic Adversarial Coding – Deep dive on code-specific critique with 8 dimensions
- Agent Swarm Patterns – Multiple critics for diverse perspectives
- Sub-Agent Architecture – Orchestrating multiple specialized agents
- Quality Gates as Information Filters – Actor-critic as pre-gate quality improvement
- Trust But Verify Protocol – Verification before accepting AI output

