Summary
Match AI model capabilities to task complexity for optimal cost-quality balance. Use Haiku for simple tasks (file reads, grep, simple edits), Sonnet for standard development (API endpoints, refactoring), and Opus for complex architecture work (system design, large refactors). Achieves 40-70% cost reduction while maintaining quality by routing 60-80% of tasks to cheaper models.
The Problem
Using a single model for all tasks is wasteful: simple tasks (file reads, grep searches) waste expensive model capacity, while complex tasks (architecture decisions, large refactors) may fail with cheaper models. This creates a cost-quality dilemma: either overpay for simple tasks or underdeliver on complex ones. Most teams default to mid-tier models for everything, missing 40-70% cost savings opportunities.
The Solution
Implement dynamic model switching based on task complexity: Haiku ($0.25/MTok) for simple operations (80% of tasks), Sonnet ($3/MTok) for standard development (15% of tasks), Opus ($15/MTok) for complex architecture (5% of tasks). Route tasks intelligently using complexity heuristics: file operations → Haiku, feature implementation → Sonnet, system design → Opus. Result: 40-70% cost reduction with same or better quality through optimal model-task matching.
The Problem
When working with AI coding agents, you face a fundamental tradeoff: model capability vs cost.
The Cost-Quality Spectrum
Claude Haiku (Cheapest):
- Cost: $0.25 per 1M input tokens, $1.25 per 1M output tokens
- Capabilities: Fast, simple tasks (file reads, basic edits, grep searches)
- Limitations: Struggles with complex architecture, multi-file refactors
Claude Sonnet (Mid-tier):
- Cost: $3 per 1M input tokens, $15 per 1M output tokens
- Capabilities: Most development tasks (API endpoints, features, refactoring)
- Limitations: May miss nuanced architecture concerns
Claude Opus (Most Capable):
- Cost: $15 per 1M input tokens, $75 per 1M output tokens
- Capabilities: Complex architecture, system design, large refactors
- Limitations: Expensive for simple tasks (overkill)
The Single-Model Problem
Most teams pick one model and use it for everything:
Approach 1: Always use Haiku (cheap)
✅ Cost: Very low
❌ Quality: Poor on complex tasks
❌ Developer frustration: High (re-work needed)
Example failure:
"Refactor authentication system to use OAuth"
→ Haiku produces incomplete implementation
→ Missing edge cases, security issues
→ Requires manual fixes or re-generation
Approach 2: Always use Sonnet (balanced)
✅ Cost: Moderate
✅ Quality: Good on most tasks
❌ Missed savings: 40-60% overspending on simple tasks
Example waste:
"Read file and find function getUserById"
→ Sonnet costs $3/MTok
→ Haiku could do this for $0.25/MTok (12x cheaper)
→ Wasted $2.75 per million tokens
Approach 3: Always use Opus (best quality)
✅ Quality: Excellent on all tasks
❌ Cost: Very high (5-60x more expensive than needed)
❌ Slow: Opus is slower than Haiku/Sonnet
Example waste:
"Add console.log to function"
→ Opus costs $15/MTok
→ Haiku could do this for $0.25/MTok (60x cheaper)
→ Wasted $14.75 per million tokens
The Real-World Impact
Consider a typical development day:
Task Breakdown (100 AI requests/day):
- Simple tasks: 60 requests (file reads, grep, simple edits)
- Medium tasks: 30 requests (feature implementation, refactoring)
- Complex tasks: 10 requests (architecture decisions, system design)
Costs if using only Sonnet (5K tokens per request):
100 requests × 5K tokens × $0.003 = $1.50/day
× 22 work days = $33/month per developer
× 5 developers = $165/month = $1,980/year
Costs with model switching:
Simple (Haiku): 60 × 5K × $0.00025 = $0.075/day
Medium (Sonnet): 30 × 5K × $0.003 = $0.45/day
Complex (Opus): 10 × 5K × $0.015 = $0.75/day
Total: $1.275/day (15% reduction)
× 22 work days = $28/month per developer
× 5 developers = $140/month = $1,680/year
Savings: $300/year (15%)
Wait, only 15%? Let’s optimize further…
The key insight: Most tasks are simpler than you think.
The Solution
Implement dynamic model switching based on task complexity.
Task Classification Framework
Tier 1: Haiku Tasks (60-80% of requests)
Characteristics:
- Single-file operations
- No complex logic
- Clear, deterministic output
- Fast execution required
Examples:
# File operations
"Read src/api/users/get-user.ts"
"List all files in src/components/"
"Find imports of UserService"
# Simple searches
"Grep for function getUserById"
"Find all TODO comments"
"Search for console.log statements"
# Basic edits
"Add type annotation to parameter 'id'"
"Rename variable 'user' to 'currentUser'"
"Add error logging to catch block"
# Documentation
"Add JSDoc comment to this function"
"Update README with new API endpoint"
"Fix typo in comment"
Cost: $0.25/MTok input, $1.25/MTok output
When to use:
- Task is purely informational (read-only)
- Task affects single file/function
- Task has clear right answer
- Task requires speed over depth
Tier 2: Sonnet Tasks (15-30% of requests)
Characteristics:
- Multi-file operations
- Moderate logic complexity
- Requires context understanding
- Standard development work
Examples:
// Feature implementation
"Add new API endpoint POST /api/users/update-profile"
"Implement email validation with Zod schema"
"Create React component for user profile card"
// Refactoring
"Extract validation logic into separate function"
"Refactor getUserById to use Result<T, E> pattern"
"Move database queries to repository layer"
// Bug fixes
"Fix race condition in async user fetch"
"Handle edge case where email is null"
"Add missing error handling to API route"
// Testing
"Write integration tests for user authentication"
"Add test cases for email validation edge cases"
"Mock database calls in user service tests"
Cost: $3/MTok input, $15/MTok output
When to use:
- Task spans 2-5 files
- Task requires understanding patterns
- Task involves business logic
- Task needs moderate context
Tier 3: Opus Tasks (5-15% of requests)
Characteristics:
- System-wide changes
- Architecture decisions
- Complex multi-step refactors
- High-stakes implementations
Examples:
// Architecture
"Design authentication system with OAuth + JWT"
"Plan migration from monolith to microservices"
"Architect real-time notification system"
// Large refactors
"Refactor entire API layer to use tRPC"
"Migrate database from MongoDB to PostgreSQL"
"Convert class-based codebase to functional patterns"
// Complex features
"Implement payment processing with Stripe"
"Build real-time collaborative editing"
"Create caching layer with Redis"
// System debugging
"Diagnose memory leak across multiple services"
"Fix race condition in distributed lock"
"Resolve deadlock in database transaction"
Cost: $15/MTok input, $75/MTok output
When to use:
- Task affects 6+ files
- Task requires deep architecture knowledge
- Task has security/performance implications
- Task is mission-critical
Decision Tree
Task complexity?
│
├─ "Read file" / "Find pattern" / "Simple edit"
│ └─> Haiku (Tier 1)
│
├─ "Implement feature" / "Refactor function" / "Write tests"
│ ├─ Affects 1 file?
│ │ └─> Haiku (Tier 1)
│ ├─ Affects 2-5 files?
│ │ └─> Sonnet (Tier 2)
│ └─ Affects 6+ files?
│ └─> Opus (Tier 3)
│
├─ "Design system" / "Large refactor" / "Architecture"
│ └─> Opus (Tier 3)
│
└─ "Debug production" / "Security issue" / "Performance critical"
└─> Opus (Tier 3)
Automated Model Selection
Implement heuristics to automatically route tasks:
type ModelTier = 'haiku' | 'sonnet' | 'opus';
interface TaskComplexity {
filesAffected: number;
linesOfCode: number;
requiresArchitecture: boolean;
securityCritical: boolean;
performanceCritical: boolean;
multiStepPlan: boolean;
}
function selectModel(task: string, complexity: TaskComplexity): ModelTier {
// Security/performance always uses Opus
if (complexity.securityCritical || complexity.performanceCritical) {
return 'opus';
}
// Architecture decisions use Opus
if (complexity.requiresArchitecture || complexity.multiStepPlan) {
return 'opus';
}
// Large refactors use Opus
if (complexity.filesAffected > 5 || complexity.linesOfCode > 500) {
return 'opus';
}
// Multi-file work uses Sonnet
if (complexity.filesAffected > 1 || complexity.linesOfCode > 50) {
return 'sonnet';
}
// Simple operations use Haiku
const simplePatterns = [
/^read /i,
/^find /i,
/^grep /i,
/^list /i,
/^search /i,
/^show /i,
/^add (comment|jsdoc|type)/i,
/^rename (variable|function)/i,
];
if (simplePatterns.some(pattern => pattern.test(task))) {
return 'haiku';
}
// Default to Sonnet for unknown complexity
return 'sonnet';
}
// Usage
const task = "Read src/api/users.ts and find getUserById function";
const complexity = {
filesAffected: 1,
linesOfCode: 0,
requiresArchitecture: false,
securityCritical: false,
performanceCritical: false,
multiStepPlan: false,
};
const model = selectModel(task, complexity);
console.log(model); // "haiku"
Progressive Model Escalation
Start with cheaper models, escalate if needed:
async function executeWithEscalation(task: string): Promise<Result> {
// Try Haiku first
const haikuResult = await executeTask(task, 'haiku');
if (haikuResult.confidence > 0.8 && haikuResult.quality > 0.9) {
return haikuResult; // Success with cheap model
}
// Haiku struggled, try Sonnet
console.log('Escalating to Sonnet for better quality');
const sonnetResult = await executeTask(task, 'sonnet');
if (sonnetResult.confidence > 0.8 && sonnetResult.quality > 0.9) {
return sonnetResult; // Success with mid-tier
}
// Sonnet struggled, use Opus
console.log('Escalating to Opus for complex task');
const opusResult = await executeTask(task, 'opus');
return opusResult; // Use best model as fallback
}
Benefits:
- Try cheap model first (saves cost if it works)
- Automatic escalation ensures quality
- Learning: track which tasks need escalation
Model-Specific Strengths
Haiku excels at:
- File I/O operations
- Pattern matching (grep, find)
- Simple text transformations
- Documentation updates
- Quick edits to existing code
- AST navigation
Sonnet excels at:
- Feature implementation
- Standard refactoring
- Test writing
- API endpoint creation
- Bug fixes
- Code review
Opus excels at:
- System design
- Architecture decisions
- Complex refactoring
- Performance optimization
- Security hardening
- Multi-service debugging
Implementation
Step 1: Audit Current Usage
Track your tasks for 1 week:
interface TaskLog {
task: string;
model: ModelTier;
tokensUsed: number;
duration: number;
success: boolean;
}
const logs: TaskLog[] = [];
function logTask(task: TaskLog) {
logs.push(task);
// Daily summary
const dailyCost = logs.reduce((sum, t) => {
const costPer1M = t.model === 'haiku' ? 0.25 : t.model === 'sonnet' ? 3 : 15;
return sum + (t.tokensUsed / 1_000_000) * costPer1M;
}, 0);
console.log(`Daily cost: $${dailyCost.toFixed(2)}`);
}
Analyze:
- What % of tasks are simple (Haiku-eligible)?
- What % need Sonnet?
- What % truly need Opus?
Typical findings:
- 60-80% of tasks could use Haiku
- 15-30% need Sonnet
- 5-15% need Opus
Step 2: Implement Classification Rules
Create a classifier based on audit:
const TASK_PATTERNS = {
haiku: [
/^(read|show|list|find|grep|search)/i,
/^add (comment|jsdoc|type|log)/i,
/^fix (typo|comment|import)/i,
/single file/i,
/rename variable/i,
],
opus: [
/^(design|architect|plan|migrate)/i,
/security|auth|payment|crypto/i,
/performance|optimize|scale/i,
/refactor (entire|all|system)/i,
/multiple services/i,
],
// Everything else → Sonnet (default)
};
function classifyTask(task: string): ModelTier {
// Check Opus patterns first (highest priority)
if (TASK_PATTERNS.opus.some(p => p.test(task))) {
return 'opus';
}
// Check Haiku patterns
if (TASK_PATTERNS.haiku.some(p => p.test(task))) {
return 'haiku';
}
// Default to Sonnet
return 'sonnet';
}
Step 3: Add Quality Checks
Verify cheaper models produce acceptable quality:
interface QualityMetrics {
syntaxValid: boolean;
testsPass: boolean;
lintPasses: boolean;
typesValid: boolean;
}
async function executeWithQualityCheck(
task: string,
model: ModelTier
): Promise<{ result: string; metrics: QualityMetrics }> {
const result = await executeTask(task, model);
const metrics = {
syntaxValid: await checkSyntax(result),
testsPass: await runTests(result),
lintPasses: await runLint(result),
typesValid: await checkTypes(result),
};
const qualityScore = Object.values(metrics).filter(Boolean).length / 4;
// If quality is low, consider escalating
if (qualityScore < 0.75 && model !== 'opus') {
console.log(`Quality score ${qualityScore} too low, escalating model`);
const nextModel = model === 'haiku' ? 'sonnet' : 'opus';
return executeWithQualityCheck(task, nextModel);
}
return { result, metrics };
}
Step 4: Monitor and Optimize
Track model performance over time:
interface ModelPerformance {
model: ModelTier;
tasksCompleted: number;
successRate: number;
avgCost: number;
avgDuration: number;
escalationRate: number; // How often it needed a better model
}
function analyzePerformance(logs: TaskLog[]): ModelPerformance[] {
const byModel = groupBy(logs, 'model');
return Object.entries(byModel).map(([model, tasks]) => ({
model: model as ModelTier,
tasksCompleted: tasks.length,
successRate: tasks.filter(t => t.success).length / tasks.length,
avgCost: tasks.reduce((sum, t) => sum + t.cost, 0) / tasks.length,
avgDuration: tasks.reduce((sum, t) => sum + t.duration, 0) / tasks.length,
escalationRate: tasks.filter(t => t.escalated).length / tasks.length,
}));
}
// Monthly report
const performance = analyzePerformance(lastMonthLogs);
console.log('Model Performance:');
performance.forEach(p => {
console.log(`${p.model}:`);
console.log(` Success Rate: ${(p.successRate * 100).toFixed(1)}%`);
console.log(` Avg Cost: $${p.avgCost.toFixed(4)}`);
console.log(` Escalation Rate: ${(p.escalationRate * 100).toFixed(1)}%`);
});
Optimize based on metrics:
- High escalation rate: Your Haiku rules are too aggressive
- Low Haiku usage: You’re missing cost savings opportunities
- High Opus usage: You’re overestimating task complexity
Cost Savings Analysis
Scenario 1: Individual Developer
Baseline (100% Sonnet):
100 requests/day × 5K tokens × $0.003 = $1.50/day
× 22 workdays = $33/month = $396/year
With model switching (70% Haiku, 25% Sonnet, 5% Opus):
Haiku: 70 × 5K × $0.00025 = $0.0875/day
Sonnet: 25 × 5K × $0.003 = $0.375/day
Opus: 5 × 5K × $0.015 = $0.375/day
Total: $0.8375/day × 22 = $18.43/month = $221/year
Savings: $175/year (44% reduction)
Scenario 2: Small Team (5 developers)
Baseline: 5 × $396 = $1,980/year
With switching: 5 × $221 = $1,105/year
Savings: $875/year (44% reduction)
Scenario 3: Large Team (20 developers)
Baseline: 20 × $396 = $7,920/year
With switching: 20 × $221 = $4,420/year
Savings: $3,500/year (44% reduction)
Optimized Scenario (Aggressive Haiku Usage)
Task distribution (80% Haiku, 15% Sonnet, 5% Opus):
Haiku: 80 × 5K × $0.00025 = $0.10/day
Sonnet: 15 × 5K × $0.003 = $0.225/day
Opus: 5 × 5K × $0.015 = $0.375/day
Total: $0.70/day × 22 = $15.40/month = $185/year per developer
Savings vs baseline: $396 - $185 = $211/year (53% reduction)
For 20 developers: $4,220/year savings (53% reduction)
Best Practices
1. Start Conservative, Optimize Over Time
Week 1: Use Sonnet for everything (baseline)
Week 2: Enable Haiku for obvious simple tasks
if (task.startsWith('read') || task.startsWith('find')) {
model = 'haiku';
}
Week 3: Expand Haiku coverage based on success rate
Week 4: Add Opus for complex tasks
Month 2: Fine-tune based on escalation rates
2. Use Quality Gates to Validate Cheaper Models
Don’t trust Haiku blindly:
const result = await haiku.execute(task);
// Verify quality
if (!result.syntaxValid || !result.testsPass) {
console.log('Haiku failed quality check, escalating');
return sonnet.execute(task);
}
return result; // Haiku succeeded, saved money
3. Track Model Confidence
LLMs can estimate their own confidence:
interface ModelResult {
output: string;
confidence: number; // 0-1
reasoning: string;
}
const result = await haiku.execute(task);
if (result.confidence < 0.7) {
// Haiku is unsure, use better model
return sonnet.execute(task);
}
4. Learn from Escalations
interface Escalation {
task: string;
fromModel: ModelTier;
toModel: ModelTier;
reason: string;
}
const escalations: Escalation[] = [];
function recordEscalation(esc: Escalation) {
escalations.push(esc);
// Find patterns
const commonPatterns = findCommonPatterns(escalations);
console.log('Tasks that frequently escalate:', commonPatterns);
// Update classifier
commonPatterns.forEach(pattern => {
TASK_PATTERNS[esc.toModel].push(pattern);
});
}
5. Combine with Prompt Caching
Cache context across model switches:
// Same cached context works for all models
const cachedContext = `
# CLAUDE.md content
# Schema definitions
# Coding standards
`;
// Haiku request
await haiku.execute({
context: cachedContext, // Cached at $0.000025/MTok
task: "Read file and find getUserById",
});
// Sonnet request (reuses cache)
await sonnet.execute({
context: cachedContext, // Already cached, no extra cost
task: "Refactor getUserById to use Result<T, E>",
});
Combined savings: 40-70% from model switching + 90% from caching = 94-97% total cost reduction on repeated context.
6. Consider Latency Tradeoffs
Haiku: ~1-2 seconds (fastest)
Sonnet: ~2-4 seconds (medium)
Opus: ~4-8 seconds (slowest)
For time-sensitive tasks, prefer Haiku even if Sonnet might be slightly better:
if (task.urgent && estimatedComplexity < 0.7) {
return 'haiku'; // Favor speed
}
7. Use Opus Strategically
Opus is 5-60x more expensive, use sparingly:
✅ Good Opus uses:
- Architecture decisions (affects entire system)
- Security implementations (mistakes are costly)
- Performance optimization (expertise needed)
- Complex refactors (high risk of breaking)
❌ Bad Opus uses:
- "Just to be safe" (unnecessary)
- Simple tasks (wasteful)
- Routine development (overkill)
Common Pitfalls
❌ Pitfall 1: Over-Optimizing Too Early
Problem: Trying to use Haiku for everything to save money
Result: Poor quality, high escalation rate, developer frustration
Solution: Start with Sonnet, gradually increase Haiku usage as you learn
❌ Pitfall 2: Not Tracking Escalations
Problem: No visibility into when/why escalations happen
Result: Can’t improve classifier over time
Solution: Log all escalations with reasons, analyze monthly
❌ Pitfall 3: Ignoring Quality Metrics
Problem: Assuming cheaper model = acceptable quality
Result: Bugs slip through, tests fail, re-work needed
Solution: Always run quality gates (tests, linter, type checker)
❌ Pitfall 4: Static Classification Rules
Problem: Rules never adapt to actual usage patterns
Result: Suboptimal model selection over time
Solution: Use ML-based classifier that learns from escalations
❌ Pitfall 5: Not Considering Context Size
Problem: Large context makes Haiku less cost-effective
Result: Paying for expensive context with cheap model
Example:
Haiku: $0.25/MTok input
Sonnet: $3/MTok input
Task with 100K context tokens:
Haiku: 100K × $0.00000025 = $0.025
Sonnet: 100K × $0.000003 = $0.30
Difference: $0.275 (but Sonnet might handle task better)
If Haiku needs 3 retries:
Haiku: $0.025 × 3 = $0.075 (still cheaper, but slower)
If Sonnet succeeds first try:
Sonnet: $0.30 (more expensive, but faster)
Consider: Is $0.225 worth 2 extra retries?
Solution: Factor in retry cost when selecting model
Integration with Other Patterns
Combine with Hierarchical CLAUDE.md
Smaller context = more cost-effective Haiku usage:
❌ Large context (10K tokens):
Haiku cost: 10K × $0.00000025 = $0.0025
Sonnet cost: 10K × $0.000003 = $0.03
Difference: $0.0275
✅ Hierarchical context (2K tokens):
Haiku cost: 2K × $0.00000025 = $0.0005
Sonnet cost: 2K × $0.000003 = $0.006
Difference: $0.0055
Benefit: Haiku is even more attractive with smaller context
Combine with Quality Gates
Use gates to validate cheaper models:
// Try Haiku first
const haikuResult = await haiku.execute(task);
// Run through quality gates
const gates = [
checkSyntax,
checkTypes,
runLinter,
runTests,
];
for (const gate of gates) {
if (!await gate(haikuResult)) {
// Haiku failed, escalate
return sonnet.execute(task);
}
}
// Haiku passed all gates, use it
return haikuResult;
Combine with Test-Based Regression Patching
Cheaper models work better with good test coverage:
Good test coverage (80%+):
→ Haiku can validate correctness via tests
→ Safe to use Haiku more often
Poor test coverage (<50%):
→ Harder to validate Haiku output
→ Need Sonnet/Opus for reliability
Measuring Success
Key Metrics
1. Cost Reduction
const costReduction = (baselineCost - actualCost) / baselineCost;
console.log(`Cost reduction: ${(costReduction * 100).toFixed(1)}%`);
// Target: 40-70% reduction
2. Quality Maintenance
const qualityScore = tasksPassingAllGates / totalTasks;
console.log(`Quality score: ${(qualityScore * 100).toFixed(1)}%`);
// Target: >95% passing
3. Escalation Rate
const escalationRate = tasksEscalated / totalTasks;
console.log(`Escalation rate: ${(escalationRate * 100).toFixed(1)}%`);
// Target: <20%
4. Developer Satisfaction
Survey questions:
- Are you satisfied with AI code quality?
- Do model switches cause delays?
- Do you trust automatic model selection?
Target: >80% satisfaction
Dashboard Example
interface ModelSwitchingDashboard {
period: string;
costReduction: number;
qualityScore: number;
modelDistribution: {
haiku: number;
sonnet: number;
opus: number;
};
escalationRate: number;
avgTaskDuration: number;
}
const dashboard: ModelSwitchingDashboard = {
period: '2025-11',
costReduction: 0.52, // 52%
qualityScore: 0.96, // 96%
modelDistribution: {
haiku: 0.72, // 72% of tasks
sonnet: 0.23, // 23% of tasks
opus: 0.05, // 5% of tasks
},
escalationRate: 0.14, // 14% needed escalation
avgTaskDuration: 3.2, // 3.2 seconds
};
Conclusion
Model switching is a high-impact, low-effort optimization for AI-assisted development costs.
Key Takeaways:
- Match model to task complexity: Haiku for simple, Sonnet for standard, Opus for complex
- Most tasks are simpler than you think: 60-80% can use Haiku
- Use quality gates to validate: Don’t trust cheaper models blindly
- Progressive escalation: Try cheap first, escalate if needed
- Track and optimize: Learn from escalations, improve over time
- Combine with caching: Cache context across model switches
- Reserve Opus for critical work: Architecture, security, performance
The result: 40-70% cost reduction while maintaining or improving quality through intelligent model-task matching.
For a team of 20 developers, that’s $3,500-5,000/year in savings—enough for an extra team member or better infrastructure.
Related Concepts
- Prompt Caching Strategy – Combine caching with model switching for 94-97% total cost reduction
- Hierarchical Context Patterns – Smaller context makes Haiku more cost-effective
- Quality Gates as Information Filters – Use gates to validate cheaper model output
- Test-Based Regression Patching – Better test coverage enables more Haiku usage
- Context Debugging Framework – Layer 3 (Model Power) guides when to escalate models
- Plan Mode Strategic – Use planning to determine appropriate model tier before implementation
- Sub-Agent Architecture – Different specialized agents can use different model tiers based on task complexity
References
- Anthropic Pricing – Current pricing for Claude models (Haiku, Sonnet, Opus)
- Claude Model Comparison – Official documentation comparing Claude model capabilities

