The Actor-Critic Pattern: Writer + Reviewer Agents

James Phoenix

Summary

The actor-critic pattern separates generation from evaluation. One agent (the actor/writer) produces output, while another (the critic/reviewer) evaluates and improves it. This separation creates higher-quality results than single-pass generation. The pattern applies to code, documentation, design docs, and any content where iterative refinement adds value.

The Core Pattern

┌────────────────────┐
│  ACTOR (Writer)    │
│  - Generates       │
│  - Creates quickly │
│  - Optimistic      │
└────────┬───────────┘
         │
         ▼
    [Draft output]
         │
         ▼
┌────────────────────┐
│  CRITIC (Reviewer) │
│  - Evaluates       │
│  - Finds issues    │
│  - Skeptical       │
└────────┬───────────┘
         │
         ├── Issues? → Actor revises → Critic re-reviews
         │
         └── No issues → Done

The actor focuses on completing the task. The critic focuses on finding problems. Neither role is “more important.” Together they produce better results than either could alone.

Why Separation Works

Single-pass generation has a fundamental conflict: the same mind that creates content cannot objectively evaluate it. The author sees what they intended to write, not what they actually wrote.

Separating roles fixes this:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

The actor writes freely without self-censoring or over-engineering
The critic reviews objectively without ownership bias
Multiple passes catch issues that fresh eyes reveal

This mirrors effective human workflows. Writers benefit from editors. Developers benefit from code review. The pattern encodes this wisdom into AI systems.

Two Implementation Approaches

Approach 1: Single Agent, Alternating Roles

Same LLM switches between actor and critic personas:

// Round 1: Write
const draft = await llm.generate(`
You are a technical writer creating a PRD.
Write a PRD for: ${feature}
`);

// Round 2: Review
const critique = await llm.generate(`
You are a senior product manager reviewing a PRD.
Find gaps, unclear requirements, and missing edge cases.

PRD to review:
${draft}
`);

// Round 3: Revise
const revised = await llm.generate(`
You are improving a PRD based on feedback.

Original PRD:
${draft}

Feedback:
${critique}

Revise to address all issues.
`);

Advantages: Simpler, cheaper (one model), maintains context between rounds.

Disadvantages: Same model biases persist, limited adversarial tension.

Approach 2: Separate Agents

Distinct agent instances with specialized configurations:

const writer = new Agent({
  role: 'technical-writer',
  persona: 'Write clear, concise documentation. Follow style guides.',
  tools: ['Read', 'Glob', 'Write'],
});

const reviewer = new Agent({
  role: 'documentation-reviewer',
  persona: `Review for clarity, accuracy, and completeness.
    Be critical. Find every gap and inconsistency.
    Only approve if genuinely ready for publication.`,
  tools: ['Read', 'Grep'],  // Read-only
});

// Execute loop
let doc = await writer.create(task);
let approved = false;

while (!approved && rounds < maxRounds) {
  const review = await reviewer.evaluate(doc);

  if (review.verdict === 'APPROVED') {
    approved = true;
  } else {
    doc = await writer.revise(doc, review.feedback);
  }
  rounds++;
}

Advantages: True separation of concerns, can use different models (cheaper critic), stronger adversarial tension.

Disadvantages: More complex setup, context not shared automatically.

When to Use Each Approach

Scenario	Recommended Approach
Quick drafts, low stakes	Single agent
Security-critical code	Separate agents
Documentation, PRDs	Either works
High-visibility content	Separate agents
Tight budget	Single agent
Need audit trail	Separate agents

Application Beyond Code

The actor-critic pattern is not limited to code review. It applies wherever quality matters:

Documentation

Actor (Writer): Creates technical documentation following style guides.

Critic (Editor): Checks for clarity, accuracy, completeness, and jargon.

## Critique Dimensions for Documentation

1. **Clarity**: Can a newcomer understand this?
2. **Accuracy**: Do code examples work? Are facts correct?
3. **Completeness**: Are edge cases covered? Missing sections?
4. **Consistency**: Does terminology match the codebase?
5. **Actionability**: Can readers follow the instructions?

PRDs and Design Docs

Actor (Author): Writes requirements and design proposals.

Critic (Stakeholder): Challenges assumptions, finds gaps, questions feasibility.

## Critique Dimensions for PRDs

1. **Requirements completeness**: All user needs captured?
2. **Edge cases**: What happens when things go wrong?
3. **Dependencies**: Are external dependencies identified?
4. **Measurability**: How will success be measured?
5. **Feasibility**: Can engineering build this as specified?

Book Chapters

The RALPH loop uses actor-critic for book writing:

Actor (Chapter Writer): Writes first draft following PRD and sources.

Critic (Reviewer Agents): Multiple specialized reviewers check different dimensions.

## Review Agents for Book Chapters

- **slop-checker**: Finds AI-text tells (delve, crucial, moreover)
- **tech-accuracy**: Validates code examples and tool references
- **term-intro-checker**: Ensures acronyms are defined
- **oreilly-style**: Applies publishing conventions

Stopping Criteria

The loop needs exit conditions:

Approval: Critic finds no significant issues
Max rounds: Typically 3-5 rounds (diminishing returns)
Improvement stall: Less than 10% issue reduction between rounds
Escalation: Hand off to human when stuck

function shouldStop(round: number, issues: number, prevIssues: number): boolean {
  if (issues === 0) return true;  // Approved
  if (round >= 5) return true;    // Max rounds

  const improvement = (prevIssues - issues) / prevIssues;
  if (improvement < 0.1) return true;  // Stalled

  return false;
}

Cost vs Quality Trade-offs

More rounds cost more but catch more issues:

Rounds	Typical Cost	Issues Caught
1	$0.05-0.15	0% (no review)
2	$0.10-0.30	60-70%
3	$0.15-0.45	80-85%
4	$0.20-0.60	90-95%
5	$0.25-0.75	95%+

For high-stakes content, 3-5 rounds are worth the cost. For drafts and internal docs, 1-2 rounds suffice.

Common Pitfalls

1. Critic Too Harsh

Problem: Critic never approves, creates infinite loops.

Fix: Set maximum rounds. Accept “good enough.” Have critic distinguish critical vs minor issues.

2. Critic Too Lenient

Problem: Critic approves everything, provides no value.

Fix: Use adversarial prompting. Require specific issue counts. Set quality thresholds.

3. Lost Context Between Rounds

Problem: Actor forgets original requirements when revising.

Fix: Include original task in every revision prompt. Use session-based agents.

4. Over-Engineering

Problem: Each round adds complexity without improving quality.

Fix: Critic should enforce simplicity. “Only fix real issues, don’t add features.”

Integration with RALPH

The RALPH loop uses actor-critic at multiple levels:

Micro-level: Within each task, code/content is generated then reviewed
Macro-level: Every 6 iterations, review agents run across all output
Meta-level: Progress summarizer evaluates overall quality metrics

This multi-scale application of actor-critic creates compounding quality.

Actor-Critic Adversarial Coding – Deep dive on code-specific critique with 8 dimensions
Agent Swarm Patterns – Multiple critics for diverse perspectives
Sub-Agent Architecture – Orchestrating multiple specialized agents
Quality Gates as Information Filters – Actor-critic as pre-gate quality improvement
Trust But Verify Protocol – Verification before accepting AI output