Sub-agents: Accuracy vs Latency Trade-off

James Phoenix
James Phoenix

Sub-agents trade latency for accuracy. Use them when correctness matters more than speed.

Ready to implement? See Full Architecture Guide for implementation patterns.


What Are Sub-agents?

Sub-agents are specialized AI assistants that Claude Code can delegate tasks to. Each sub-agent:

  • Has a specific purpose and expertise area
  • Uses its own context window (separate from main conversation)
  • Can be configured with specific tools
  • Includes a custom system prompt guiding behavior

The Core Trade-off

┌─────────────────────────────────────────────────┐
│                                                 │
│   Accuracy ◄──────────────────────► Latency    │
│                                                 │
│   Sub-agent:     ████████████░░░░   High/High  │
│   Main agent:    ██████░░░░░░░░░░   Med/Low    │
│   Script:        ████░░░░░░░░░░░░   Low/None   │
│                                                 │
└─────────────────────────────────────────────────┘
Approach Accuracy Latency Token Cost
Sub-agent High (fresh context, specialized) High (cold start, gathering context) Higher
Main agent Medium (context pollution) Low (already running) Medium
Script Fixed (deterministic) None Zero

Why Sub-agents Are More Accurate

1. Fresh Context Window

Main conversation accumulates noise. Sub-agents start clean.

Main conversation (50k tokens):
- Previous debugging session
- Unrelated file reads
- Abandoned approaches
- Old error messages
        ↓
    Context pollution = degraded performance
Sub-agent context (5k tokens):
- Just the task description
- Only relevant files
- Focused system prompt
        ↓
    Clean context = better reasoning

2. Specialized System Prompts

Sub-agents can have detailed, task-specific instructions:

---
name: security-reviewer
description: Security audit specialist. Use proactively after code changes.
tools: Read, Grep, Glob, Bash
---

You are a security expert reviewing code for vulnerabilities.

Focus on:
- OWASP Top 10 vulnerabilities
- Input validation gaps
- Authentication/authorization flaws
- Secrets exposure
- Injection risks

For each finding:
1. Severity (Critical/High/Medium/Low)
2. File and line number
3. Specific vulnerability type
4. Exploitation scenario
5. Remediation code

A main agent can’t hold this level of specialization for every domain.

3. Tool Restriction

Limiting tools focuses the agent:

# Code reviewer - read only
tools: Read, Grep, Glob, Bash

# Fixer - can edit
tools: Read, Edit, Bash, Grep, Glob

# Deployer - specific access
tools: Bash

Fewer tools = less decision paralysis = better execution.


When Sub-agents Win

High-Stakes Decisions

Security review before production deploy
        ↓
    Use sub-agent (accuracy > speed)

Complex Analysis

Analyze entire codebase for performance issues
        ↓
    Use sub-agent (clean context for large scope)

Specialized Domains

Database query optimization
        ↓
    Use specialized sub-agent with SQL expertise

When Main Agent Wins

Quick Iterations

Fix this typo
        ↓
    Main agent (latency > accuracy overkill)

Context Already Loaded

Continue the refactor we started
        ↓
    Main agent (has the context)

Simple Tasks

Run the tests
        ↓
    Main agent or script (no specialization needed)

The Latency Cost

Sub-agents add latency because they:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course
  1. Cold start – Initialize new context
  2. Gather context – Re-read files main agent already knows
  3. Build understanding – Can’t leverage prior conversation
Main agent task: 5 seconds (context ready)
Sub-agent task: 15-30 seconds (must gather context)

Mitigation: Resumable sub-agents can continue previous conversations:

> Resume agent abc123 and continue the analysis

Built-in Sub-agents

Claude Code includes these out of the box:

Explore (Fast, Read-only)

Model: Haiku (fast)
Tools: Glob, Grep, Read, Bash (read-only)
Purpose: Quick codebase exploration

General-purpose (Capable, Full Access)

Model: Sonnet
Tools: All
Purpose: Complex multi-step tasks

Plan (Research for Planning)

Model: Sonnet
Tools: Read, Glob, Grep, Bash
Purpose: Gather context for planning

Creating Custom Sub-agents

File Location

.claude/agents/         # Project-level (highest priority)
~/.claude/agents/       # User-level (lower priority)

Template

---
name: your-agent-name
description: When to use this agent. Use proactively for X.
tools: Tool1, Tool2, Tool3
model: sonnet  # or haiku, opus, inherit
---

You are an expert in [domain].

When invoked:
1. First step
2. Second step
3. Third step

Focus on:
- Key consideration 1
- Key consideration 2

Output format:
- Findings with file:line references
- Severity ratings
- Specific recommendations

Example: Security + Performance Swarm

Combine multiple specialized sub-agents:

> Run security-reviewer on src/auth/
> Run performance-analyzer on src/api/
> Run test-coverage-checker on src/

Aggregate findings, prioritize by severity.

Each sub-agent:

  • Has specialized expertise
  • Works in clean context
  • Returns focused results

Combined: higher accuracy than one generalist agent.


The Decision Framework

Is the task simple and context is fresh?
    YES → Main agent
    NO  ↓

Is it a repeated workflow?
    YES → Script (see: ad-hoc-to-scripts)
    NO  ↓

Does it need specialized expertise?
    YES → Custom sub-agent
    NO  ↓

Does it need clean context for complex reasoning?
    YES → Sub-agent
    NO  → Main agent

Key Principle

Sub-agents are for accuracy. Main agent is for speed. Scripts are for repetition.

Choose based on what matters most for the task at hand.


Related

Topics
Ai SpecializationClaude IntegrationContext WindowsLatency Vs AccuracySub Agents

More Insights

Cover Image for LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

Three patterns that turn agent pipelines from opaque prompt chains into debuggable, reproducible engineering systems: (1) an LLM VCR that records and replays model interactions, (2) a Run > Step > Mes

James Phoenix
James Phoenix
Cover Image for Agent Search Observation Loop: Learning What Context to Provide

Agent Search Observation Loop: Learning What Context to Provide

Watch how the agent navigates your codebase. What it searches for tells you what to hand it next time.

James Phoenix
James Phoenix