Summary
When AI doesn’t produce desired output, follow a systematic debugging hierarchy: Context (60% of issues) → Prompting (25%) → Model Power (10%) → Manual (5%). This proven protocol maximizes fix probability by addressing root causes in order of likelihood, avoiding wasted time on model switching when the real issue is missing context or unclear instructions.
The Problem
AI doesn’t produce the desired output, but it’s unclear why. Developers waste time trying random fixes (switching models, rephrasing prompts, adding more tokens) without a systematic approach. This trial-and-error debugging is inefficient and often misses the root cause: insufficient context accounts for 60% of AI failures, yet developers frequently jump to changing models (which fixes only 10% of issues).
The Solution
Follow a hierarchical debugging protocol ordered by likelihood of fixing the issue: 1) Context Layer (60%): Add missing information, relevant files, examples, architecture diagrams. 2) Prompting Layer (25%): Refine instructions with specific examples, edge cases, success criteria. 3) Model Power (10%): Escalate to more powerful models for genuinely complex tasks. 4) Manual Override (5%): Recognize when human intuition is needed. This systematic approach resolves most issues at the context layer, saving time and cost.
The Problem
You’re working with Claude Code, Cursor, or another AI coding agent. You give it a task, and it produces… something. But not what you wanted.
The frustration cycle begins:
- Try rephrasing the prompt
- Switch to a more powerful model
- Add more detailed instructions
- Try a different AI tool entirely
- Give up and do it manually
The core issue: No systematic approach to diagnosing why the AI isn’t producing the desired output.
Most developers waste hours on trial-and-error debugging when the fix is often simple: the AI is missing critical context.
Why This Happens
AI coding agents face a fundamental limitation: they only know what you tell them.
Unlike human developers who can:
- Ask clarifying questions
- Infer unstated requirements from experience
- Reference tribal knowledge and conventions
- Navigate ambiguity through intuition
AI agents operate in a vacuum of provided context. When output is wrong, it’s usually because:
- Missing context: Relevant code files, architecture, patterns not included
- Unclear instructions: Ambiguous requirements, missing examples, vague success criteria
- Insufficient model capability: Task genuinely exceeds model’s reasoning ability (rare)
- Impossible task: No amount of context or prompting can solve it (very rare)
The Cost of Unsystematic Debugging
Without a framework, developers typically:
Start with low-probability fixes:
- “Let me try GPT-4 instead” (fixes 10% of issues)
- “Maybe I need Opus” (fixes 10% of issues)
- “Let me rewrite this prompt” (fixes 25% of issues)
Ignore high-probability fixes:
- “Let me include relevant files” (fixes 60% of issues) ← START HERE
Time wasted:
text
Scenario: Missing context issue (60% of cases)
Unsystematic approach:
- Try different model: 15 min ✗
- Rewrite prompt: 10 min ✗
- Try another tool: 20 min ✗
- Finally add context: 5 min ✓
Total: 50 minutes
Systematic approach:
- Add context first: 5 min ✓
Total: 5 minutes
Time saved: 45 minutes (90% reduction)
## The Solution: Hierarchical Debugging Protocol
The Context Debugging Framework provides a **systematic, ordered approach** to diagnosing and fixing AI code generation issues.
**Core principle**: Debug in order of **likelihood of success**, not convenience.
### The Four-Layer Hierarchy
```text
┌─────────────────────────────────────────────────────────┐
│ Layer 1: CONTEXT (60% of issues) │
│ Add missing information, files, examples, architecture │
└─────────────────────────────────────────────────────────┘
↓
Fix not found? Proceed to Layer 2
↓
┌─────────────────────────────────────────────────────────┐
│ Layer 2: PROMPTING (25% of issues) │
│ Refine instructions, add examples, clarify constraints │
└─────────────────────────────────────────────────────────┘
↓
Fix not found? Proceed to Layer 3
↓
┌─────────────────────────────────────────────────────────┐
│ Layer 3: MODEL POWER (10% of issues) │
│ Escalate to more powerful model for complex reasoning │
└─────────────────────────────────────────────────────────┘
↓
Fix not found? Proceed to Layer 4
↓
┌─────────────────────────────────────────────────────────┐
│ Layer 4: MANUAL OVERRIDE (5% of issues) │
│ Recognize when human intuition/intervention needed │
└─────────────────────────────────────────────────────────┘
Why this order? Each layer is ordered by probability of fixing the issue. Starting with context (60% success rate) maximizes efficiency.
Layer 1: Context (60% of Issues)
Problem signature: AI produces plausible but incorrect code that doesn’t fit the codebase.
Root cause: AI lacks information about:
- Existing patterns and conventions
- Architecture and design decisions
- Related code and dependencies
- Domain-specific requirements
- Error messages and stack traces
Context Debugging Checklist
When AI output is wrong, systematically add context:
1. Include Relevant Code Files
Before (AI has no context):
// Prompt: "Create a user authentication endpoint"
// AI generates (generic, doesn't fit project):
export async function login(req: Request, res: Response) {
const { email, password } = req.body;
const user = await User.findOne({ email });
if (user && user.password === password) {
res.json({ success: true, user });
} else {
res.status(401).json({ error: 'Invalid credentials' });
}
}
```text
**After** (AI sees existing patterns):
```typescript
// Added context: Include existing auth endpoint as example
// File: src/api/auth/register.ts
export const registerHandler = async (input: RegisterInput): Promise<AuthResult> => {
const validation = validateEmail(input.email);
if (!validation.valid) {
return { success: false, errors: validation.errors };
}
// ... rest of implementation
return { success: true, user: newUser };
};
// Prompt: "Create a user authentication endpoint following the pattern in register.ts"
// AI generates (matches project patterns):
export const loginHandler = async (input: LoginInput): Promise<AuthResult> => {
const validation = validateEmail(input.email);
if (!validation.valid) {
return { success: false, errors: validation.errors };
}
const user = await findUserByEmail(input.email);
if (!user) {
return { success: false, errors: ['Invalid credentials'] };
}
const passwordValid = await verifyPassword(input.password, user.passwordHash);
if (!passwordValid) {
return { success: false, errors: ['Invalid credentials'] };
}
return { success: true, user };
};
Result: Code matches existing patterns, uses correct types, follows conventions.
2. Provide System Architecture
Before:
Prompt: "Add caching to the API"
AI generates: Simple in-memory cache (doesn't scale, wrong for architecture)
After:
Context:
Our architecture:
- Next.js frontend
- tRPC API layer
- Redis for distributed caching
- PostgreSQL database
- Deployed on Vercel (serverless)
Caching requirements:
- Must work across serverless instances (no in-memory cache)
- Must integrate with existing Redis setup
- Must follow tRPC middleware pattern
Prompt: "Add caching to the API following our architecture"
```text
**AI generates**: Redis-backed cache with tRPC middleware (fits architecture perfectly)
#### 3. Include Error Messages and Stack Traces
**Before**:
Prompt: “Fix this bug where the API returns an error”
AI: “Can you provide more details about the error?”
**After**:
Context:
Error message:
TypeError: Cannot read property 'id' of undefined
at getUserById (src/api/users.ts:23)
at processRequest (src/api/handler.ts:45)
Code at src/api/users.ts:23:
const user = await db.users.findUnique({ where: { id: userId } });
return user.id; // Line 23 - crashes if user is null
```text
Prompt: "Fix this null reference bug"
AI generates:
const user = await db.users.findUnique({ where: { id: userId } });
if (!user) {
throw new Error(`User ${userId} not found`);
}
return user.id;
```text
**Result**: Precise fix addressing the exact issue.
#### 4. Show Database Schemas and API Contracts
**Before**:
Prompt: “Query the users table”
AI generates: Generic SQL that doesn’t match schema
**After**:
```typescript
// Context: Include Prisma schema
model User {
id String @id @default(cuid())
email String @unique
emailVerified DateTime?
name String?
image String?
accounts Account[]
sessions Session[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
// Prompt: "Query the users table to find users created in the last 7 days"
// AI generates (matches schema exactly):
const recentUsers = await prisma.user.findMany({
where: {
createdAt: {
gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000),
},
},
select: {
id: true,
email: true,
name: true,
createdAt: true,
},
});
Result: Type-safe query matching exact schema.
5. Provide Examples of Expected Behavior
Before:
Prompt: "Implement input validation"
AI generates: Generic validation with throw statements (doesn't match project)
After:
// Context: Show existing validation example
// From src/lib/validation.ts:
export function validateEmail(email: string): ValidationResult {
const errors: string[] = [];
if (!email) {
errors.push('Email is required');
} else if (!/^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email)) {
errors.push('Email format is invalid');
}
return {
valid: errors.length === 0,
errors,
};
}
// Prompt: "Implement password validation following the validateEmail pattern"
// AI generates (matches pattern):
export function validatePassword(password: string): ValidationResult {
const errors: string[] = [];
if (!password) {
errors.push('Password is required');
} else if (password.length < 8) {
errors.push('Password must be at least 8 characters');
} else if (!/[A-Z]/.test(password)) {
errors.push('Password must contain an uppercase letter');
} else if (!/[0-9]/.test(password)) {
errors.push('Password must contain a number');
}
return {
valid: errors.length === 0,
errors,
};
}
```text
**Result**: Consistent validation pattern across codebase.
### How to Gather Context Efficiently
For Claude Code and similar tools:
**Use `@` mentions to include files**:
@src/api/users.ts @src/lib/database.ts
Implement createUser endpoint following patterns in users.ts
**Reference CLAUDE.md files**:
Claude Code automatically loads CLAUDE.md files from:
- Root: /CLAUDE.md (project-wide patterns)
- Domain: /src/api/CLAUDE.md (API-specific patterns)
Ensure these files contain:
- Architecture overview
- Common patterns with examples
- Coding conventions
- Anti-patterns to avoid
**Include relevant test files**:
@tests/api/users.test.ts
Implement createUser endpoint that passes these tests
### Context Layer Success Criteria
You've exhausted the context layer when:
- ✓ All relevant code files are included
- ✓ Architecture and design patterns are explained
- ✓ Database schemas and API contracts are provided
- ✓ Error messages and stack traces are included
- ✓ Working examples are shown
- ✓ Domain-specific requirements are stated
**If AI still produces incorrect output**, proceed to Layer 2.
## Layer 2: Prompting (25% of Issues)
**Problem signature**: AI has context but produces output that doesn't meet requirements.
**Root cause**: Instructions are ambiguous, missing edge cases, or lack clear success criteria.
### Prompting Debugging Checklist
#### 1. Add Specific Examples of Desired Output
**Before** (vague requirement):
Prompt: “Format user data for display”
AI generates: Generic JSON formatting (not what you wanted)
**After** (specific example):
Prompt: “Format user data for display
Expected output:
Input: { id: 1, email: ‘[email protected]‘, createdAt: ‘2025-01-15T10:30:00Z’ }
Output: ‘[email protected] (joined Jan 15, 2025)’
Input: { id: 2, email: ‘[email protected]‘, createdAt: ‘2024-12-01T14:20:00Z’ }
Output: ‘[email protected] (joined Dec 1, 2024)’
“
AI generates: Exact format you specified
**Result**: Unambiguous output format.
#### 2. Include Edge Cases and Constraints
**Before**:
Prompt: “Parse date from string”
AI generates: Simple parsing (doesn’t handle edge cases)
**After**:
Prompt: “Parse date from string
Edge cases to handle:
- Invalid date strings: return null
- Missing date: return null
- Timezone handling: parse as UTC
- Date format: ISO 8601 only
- Out of range dates: return null
Constraints:
- Never throw exceptions
- Return type: Date | null
- Use date-fns library (already installed)
“
AI generates: Robust parsing with all edge cases handled
**Result**: Production-ready error handling.
#### 3. Provide Clear Success Criteria
**Before**:
Prompt: “Optimize this function”
AI generates: Random micro-optimizations (unclear if successful)
**After**:
Prompt: “Optimize this function
Success criteria:
- Reduce execution time from 500ms to <100ms
- Maintain existing behavior (all tests pass)
- Handle lists up to 10,000 items
- Reduce memory allocation by 50%
Current performance:
- 500ms for 1,000 items
- 5,000ms for 10,000 items
- 50MB memory allocation
“
AI generates: Targeted optimizations addressing specific metrics
**Result**: Measurable improvement with clear target.
#### 4. Break Complex Tasks into Steps
**Before** (monolithic task):
Prompt: “Implement user authentication with OAuth, email verification, password reset, and 2FA”
AI generates: Incomplete or buggy implementation (task too complex)
**After** (stepwise):
Prompt: “Implement user authentication – STEP 1: Basic email/password login
Requirements for this step:
- Accept email and password
- Validate against database
- Return JWT token on success
- Return error on failure
We’ll add OAuth, email verification, and 2FA in subsequent steps.
“
AI generates: Clean, focused implementation of Step 1
[Then continue with Step 2, Step 3, etc.]
**Result**: Incremental, testable progress.
#### 5. Use Structured Formats
**Before** (narrative prompt):
Prompt: “We need a function that takes user data and validates it, making sure the email is correct and the password meets our requirements, and also checking if the username is available”
AI generates: Confused implementation mixing concerns
**After** (structured prompt):
```markdown
Prompt:
## Function Signature
```typescript
function validateUser(data: UserInput): ValidationResult
Validation Rules
-
Email:
- Required: Yes
- Format: Valid email regex
- Uniqueness: Check database
-
Password:
- Required: Yes
- Min length: 8 characters
- Must contain: uppercase, lowercase, number
-
Username:
- Required: Yes
- Min length: 3 characters
- Max length: 20 characters
- Characters: alphanumeric, underscore, dash only
- Uniqueness: Check database
Return Type
interface ValidationResult {
valid: boolean;
errors: Array<{ field: string; message: string }>;
}
```text
## Example
```typescript
validateUser({ email: 'invalid', password: 'weak', username: 'ab' })
// Returns:
// {
// valid: false,
// errors: [
// { field: 'email', message: 'Invalid email format' },
// { field: 'password', message: 'Must contain uppercase letter' },
// { field: 'username', message: 'Must be at least 3 characters' }
// ]
// }
AI generates: Clean, structured validation with exact interface
Result: Precise implementation matching spec.
Prompting Layer Success Criteria
You’ve exhausted the prompting layer when:
- ✓ Specific examples of desired output are provided
- ✓ All edge cases and constraints are stated
- ✓ Clear success criteria are defined
- ✓ Complex tasks are broken into steps
- ✓ Structured formats (tables, JSON, code blocks) are used
If AI still produces incorrect output, proceed to Layer 3.
Layer 3: Model Power (10% of Issues)
Problem signature: AI has context and clear instructions but still fails due to reasoning limitations.
Root cause: Task genuinely exceeds the model’s capabilities.
When to Escalate Model Power
Only escalate when:
- Context and prompting are exhausted (Layers 1 & 2 complete)
- Task requires advanced reasoning: Complex algorithms, architectural decisions, multi-step planning
- Consistent failures: Same error across multiple attempts with good context
Model Escalation Strategy
Level 1: Standard Model (Claude Sonnet, GPT-4 Turbo)
↓ (if fails with good context/prompting)
Level 2: Powerful Model (Claude Opus, GPT-4)
↓ (if still fails)
Level 3: Specialized Model (Code-specific models for narrow tasks)
↓ (if still fails)
Level 4: Manual Override (human intervention needed)
Example: Complex Architecture Decision
Task: Design a real-time collaborative editing system with conflict resolution
Context & Prompting (from Layers 1 & 2):
- Existing codebase patterns
- Requirements and constraints
- Example use cases
- Success criteria
Claude Sonnet (standard model):
text
Result: Generic CRDT implementation without considering specific constraints
Issue: Lacks deep reasoning about trade-offs and edge cases
**Claude Opus** (escalated):
text
Result: Detailed analysis of CRDT vs OT approaches, considering:
- Network latency characteristics
- Database consistency requirements
- Client-side storage constraints
- Conflict resolution strategies
Recommendation: Hybrid approach with specific implementation
Outcome: ✓ Successful
Key insight: Some tasks genuinely need more powerful reasoning.
Model Power vs. Context Trade-off
Important: Don’t use model power to compensate for missing context.
text
❌ Bad approach:
“Opus will figure it out without context” (expensive, unreliable)
✓ Good approach:
“Provide context to Sonnet first, escalate to Opus only if needed”
**Cost implications**:
text
Claude Sonnet: $3 per 1M input tokens
Claude Opus: $15 per 1M input tokens (5x more expensive)
Strategy:
1. Try Sonnet with good context (90% success, low cost)
2. Escalate to Opus only for remaining 10% (high cost, rare)
Result: Optimal cost/performance balance
Model Power Success Criteria
You’ve exhausted model power when:
- ✓ Standard model fails with good context/prompting
- ✓ Powerful model fails with same inputs
- ✓ Specialized models fail (if applicable)
If AI still fails, proceed to Layer 4.
Layer 4: Manual Override (5% of Issues)
Problem signature: No AI model can solve the problem, regardless of context or capability.
Root cause: Task requires human intuition, domain expertise, or is genuinely impossible.
When to Go Manual
Recognize these scenarios early:
1. Requires Deep Domain Expertise
Example: "Design a medical diagnosis algorithm for rare disease"
Why manual: Requires specialized medical knowledge, liability concerns, regulatory compliance
Approach:
- Human expert designs approach
- AI implements under expert guidance
- Expert validates all decisions
2. Requires Human Intuition or Creativity
Example: "Design a brand identity that captures the essence of our mission"
Why manual: Subjective, requires understanding of brand values, market positioning, human emotional response
Approach:
- Human designer creates concepts
- AI helps with implementation/variations
- Human makes final creative decisions
3. Ambiguous or Contradictory Requirements
Example: "Make it faster but more comprehensive" (contradiction)
Why manual: Requires stakeholder discussion to resolve trade-offs
Approach:
- Human facilitates requirements clarification
- Once clear, AI implements solution
4. Legacy System with Tribal Knowledge
Example: "Fix bug in undocumented 15-year-old Perl script"
Why manual: No context available, requires debugging archaeological dig
Approach:
- Human debugs and documents findings
- AI helps refactor once behavior is understood
Manual Override Pattern
Don’t abandon AI entirely – use hybrid approach:
1. Human: Solve core problem manually
2. Human: Document solution clearly
3. AI: Implement solution following documentation
4. Human: Review and validate
5. AI: Expand to related cases
Example:
// Human: Manually solve first instance
// Problem: Parse complex legacy log format (no documentation)
// After manual analysis:
// Log format: [TIMESTAMP][LEVEL][MODULE:SUBMODULE] message | context:value
// Human creates example parser for one case:
function parseLogLine(line: string): LogEntry {
const timestampMatch = line.match(/^\[(.*?)\]/);
const timestamp = timestampMatch ? timestampMatch[1] : null;
// ... manual implementation ...
}
// AI: Now generate parser for all log levels using this pattern
// AI: Generate tests using the examples I provided
// AI: Add error handling for malformed lines
```text
**Result**: Human solves hard problem, AI scales solution.
### Manual Override Success Criteria
Go manual when:
- ✓ All 3 previous layers exhausted (Context → Prompting → Model Power)
- ✓ Task requires specialized domain expertise
- ✓ Task requires subjective human judgment
- ✓ Requirements are ambiguous or contradictory
- ✓ No documentation exists for legacy system
**Important**: Manual doesn't mean "give up on AI" - it means **human-AI hybrid approach**.
## Practical Application
### Real-World Example: API Endpoint Failure
**Scenario**: AI generates API endpoint, but it crashes in production.
**Layer 1: Context** ✓
Add context:
```typescript
// Include existing endpoint patterns
@src/api/users/list.ts
@src/lib/api-handler.ts
// Include error logs
Error: Cannot read property 'map' of undefined
at getUsersHandler (api/users/search.ts:15)
// Include database schema
@prisma/schema.prisma
Result: AI sees the issue – needs null check before .map() call
Fix generated:
export const searchUsersHandler = async (input: SearchInput): Promise<UserList> => {
const users = await prisma.user.findMany({
where: { name: { contains: input.query } },
});
// Added null check
if (!users) {
return { users: [], total: 0 };
}
return {
users: users.map(u => ({ id: u.id, name: u.name, email: u.email })),
total: users.length,
};
};
```text
**Result**: ✓ Fixed at Layer 1 (context)
**Time**: 5 minutes
---
**Alternative scenario**: What if context didn't fix it?
**Layer 2: Prompting** ✓
Refine prompt:
```markdown
Prompt: "Implement searchUsers endpoint
Success criteria:
- Return empty array (not null) when no results
- Handle null/undefined from database gracefully
- Match return type exactly: { users: User[], total: number }
- Include error handling for database failures
Edge cases:
- Empty search query → return all users (paginated)
- Special characters in query → escape for SQL safety
- Database timeout → return error result
"
Result: AI generates robust implementation with all edge cases
Result: ✓ Fixed at Layer 2 (prompting)
Time: 10 minutes total
Alternative scenario: What if prompting didn’t fix it?
Layer 3: Model Power ✓
Escalate to Claude Opus:
text
Same context + prompting, but with Opus’s stronger reasoning
Result: Opus recognizes complex interaction between pagination, filtering, and database query optimization
**Result**: ✓ Fixed at Layer 3 (model power)
**Time**: 20 minutes total
**Cost**: Higher (5x model cost), but rare
## Best Practices
### 1. Always Start with Layer 1 (Context)
**Never skip context**. It's tempting to jump to "let me try Opus" but:
```text
Context fixes: 60% (5 min avg)
Prompt fixes: 25% (10 min avg)
Model fixes: 10% (20 min avg)
Manual fixes: 5% (varies)
Expected time with proper ordering:
0.6 × 5 + 0.25 × 10 + 0.1 × 20 + 0.05 × 60 = 9.5 min avg
Expected time starting with Model (wrong order):
0.1 × 20 + 0.9 × 9.5 = 10.55 min avg (but wastes expensive model calls)
2. Don’t Waste Tokens on Genuinely Hard Problems
Key insight from the framework: Some tasks need Opus.
Don’t try to save $0.10 by using Sonnet for a task that requires Opus. You’ll waste hours and still fail.
❌ Bad: Try Sonnet 10 times (10 × $0.015 = $0.15) → still fails → 2 hours wasted
✓ Good: Recognize hard problem early → use Opus once ($0.075) → succeeds → 10 min
3. Build Context Libraries
Maintain reusable context:
# /CLAUDE.md (root)
Architecture:
- Next.js 14 with App Router
- tRPC for API layer
- Prisma + PostgreSQL
- Deployed on Vercel
Patterns:
- All API handlers return Result<T, Error> (never throw)
- Use Zod for runtime validation
- Prefer server components over client
[... more patterns ...]
text
Result: Every AI request automatically includes this context
### 4. Document Solutions for Future Context
When you solve a hard problem (Layer 3 or 4):
```markdown
# Add to CLAUDE.md
## Complex Log Parsing Pattern
When parsing legacy logs:
```typescript
// Use this pattern:
function parseLogLine(line: string): LogEntry | null {
try {
const [timestamp, level, module, ...rest] = line.split('|');
return { timestamp, level, module, message: rest.join('|') };
} catch {
return null; // Invalid log line
}
}
This pattern handles:
- Malformed lines (return null)
- Messages with | character (rejoin with |)
- Missing fields (throw in try/catch)
text
Result: Next time, AI solves at Layer 1 (context) instead of Layer 4 (manual)
5. Track Success Rate by Layer
Monitor where fixes happen:
interface DebugMetrics {
contextFixes: number; // Layer 1
promptFixes: number; // Layer 2
modelFixes: number; // Layer 3
manualFixes: number; // Layer 4
}
// Expected distribution:
// Context: 60%
// Prompting: 25%
// Model: 10%
// Manual: 5%
// If your distribution is off:
// Too many prompt fixes → improve context (CLAUDE.md, examples)
// Too many model fixes → improve prompting (clearer instructions)
// Too many manual fixes → improve context + prompting
Integration with Other Patterns
Context Debugging + Hierarchical Context Patterns
Hierarchical context patterns prevent context issues:
Without hierarchical context:
→ 60% of issues at context layer (frequent debugging)
With hierarchical context:
→ 30% of issues at context layer (reduced debugging)
See: Hierarchical Context Patterns
Context Debugging + Trust But Verify Protocol
Trust but verify protocol helps recognize Layer 4 scenarios:
- When to escalate to human expert
- When task needs human judgment
- When requirements need clarification
See: Trust But Verify Protocol
Context Debugging + Model Switching Strategy
Model switching complements Layer 3:
- Context debugging determines when to switch
- Model switching determines which model to use
Common Pitfalls
❌ Pitfall 1: Skipping to Model Power
Mistake: “Let me just try Opus”
Problem: Wastes expensive model calls on context issues
Fix: Always exhaust Layer 1 (context) first
❌ Pitfall 2: Adding Irrelevant Context
Mistake: Include entire codebase “just in case”
Problem: Noise reduces signal, confuses AI
Fix: Only include relevant files and examples
❌ Pitfall 3: Vague Prompting
Mistake: “Make it better” or “Fix this”
Problem: AI has no success criteria
Fix: Provide specific examples, edge cases, constraints
❌ Pitfall 4: Persisting When Manual is Needed
Mistake: Try 20+ iterations when task needs human judgment
Problem: Wastes time and money
Fix: Recognize Layer 4 scenarios early, go hybrid
❌ Pitfall 5: Not Documenting Solutions
Mistake: Solve hard problem manually, don’t document
Problem: Next person hits same issue, wastes time
Fix: Add solution to CLAUDE.md for future context
Measuring Success
Metrics to Track
-
Fix rate by layer:
Target: Context 60%, Prompting 25%, Model 10%, Manual 5% Actual: [measure your distribution] -
Average debugging time:
Target: <10 minutes per issue Measure: Track time from "AI output wrong" to "AI output correct" -
Model escalation rate:
Target: <10% of tasks need powerful model Measure: Sonnet usage vs Opus usage -
Manual override rate:
Target: <5% of tasks need manual Measure: Tasks where human had to implement core logic
Success Indicators
You’re using the framework effectively when:
- ✓ Most issues resolve at Layer 1 (context)
- ✓ Average debugging time is <10 minutes
- ✓ Rare model escalations (cost-efficient)
- ✓ Manual overrides are documented for future context
- ✓ Same issues don’t recur (learning accumulated)
Conclusion
The Context Debugging Framework provides a systematic, efficient approach to diagnosing and fixing AI code generation issues.
Key Takeaways:
- Debug hierarchically: Context (60%) → Prompting (25%) → Model (10%) → Manual (5%)
- Context first: Most issues stem from missing information, not model limitations
- Don’t skip layers: Starting with expensive solutions wastes time and money
- Document solutions: Turn Layer 3/4 fixes into Layer 1 context for next time
- Recognize limitations: Some tasks need human judgment – use hybrid approach
The result: Faster debugging, lower costs, and continuous improvement as solved problems become context for future tasks.
Instead of trial-and-error debugging that wastes hours, you have a proven protocol that resolves most issues in minutes.
Related Concepts
- Hierarchical Context Patterns – Prevent context issues proactively
- Context Rot Auto-Compacting – Prevent context degradation in long sessions
- Progressive Disclosure Context – Load context only when needed
- MCP Server for Dynamic Project Context – Solve Layer 1 context issues with queryable project knowledge
- Clean Slate Trajectory Recovery – Escaping bad LLM trajectories
- Sliding Window History – Bounded state management for context retention
- Trust But Verify Protocol – Recognize when to escalate to humans
- Model Switching Strategy – Complement Layer 3 with smart model selection
- Prompt Caching Strategy – Make context-heavy debugging cost-efficient
- Five-Point Error Diagnostic Framework – Structured error analysis
- Human-First DX Philosophy – Clear context and documentation helps humans, which directly improves AI performance
- Error Messages as Training Data – ERRORS.md provides Layer 1 context for debugging
- Prevention Protocol – Turn debugged issues into systematic prevention measures
- Test-Based Regression Patching – Write failing tests to provide concrete verification targets
References
- Claude API Documentation – Official Claude API documentation for understanding model capabilities
- Prompt Engineering Guide – Comprehensive guide to effective prompting techniques

