Context Debugging Framework: Systematic Problem-Solving for AI Code Generation

James Phoenix

Summary

When AI doesn’t produce desired output, follow a systematic debugging hierarchy: Context (60% of issues) → Prompting (25%) → Model Power (10%) → Manual (5%). This proven protocol maximizes fix probability by addressing root causes in order of likelihood, avoiding wasted time on model switching when the real issue is missing context or unclear instructions.

The Problem

AI doesn’t produce the desired output, but it’s unclear why. Developers waste time trying random fixes (switching models, rephrasing prompts, adding more tokens) without a systematic approach. This trial-and-error debugging is inefficient and often misses the root cause: insufficient context accounts for 60% of AI failures, yet developers frequently jump to changing models (which fixes only 10% of issues).

The Solution

Follow a hierarchical debugging protocol ordered by likelihood of fixing the issue: 1) Context Layer (60%): Add missing information, relevant files, examples, architecture diagrams. 2) Prompting Layer (25%): Refine instructions with specific examples, edge cases, success criteria. 3) Model Power (10%): Escalate to more powerful models for genuinely complex tasks. 4) Manual Override (5%): Recognize when human intuition is needed. This systematic approach resolves most issues at the context layer, saving time and cost.

The Problem

You’re working with Claude Code, Cursor, or another AI coding agent. You give it a task, and it produces… something. But not what you wanted.

The frustration cycle begins:

Try rephrasing the prompt
Switch to a more powerful model
Add more detailed instructions
Try a different AI tool entirely
Give up and do it manually

The core issue: No systematic approach to diagnosing why the AI isn’t producing the desired output.

Most developers waste hours on trial-and-error debugging when the fix is often simple: the AI is missing critical context.

Why This Happens

AI coding agents face a fundamental limitation: they only know what you tell them.

Unlike human developers who can:

Ask clarifying questions
Infer unstated requirements from experience
Reference tribal knowledge and conventions
Navigate ambiguity through intuition

AI agents operate in a vacuum of provided context. When output is wrong, it’s usually because:

Missing context: Relevant code files, architecture, patterns not included
Unclear instructions: Ambiguous requirements, missing examples, vague success criteria
Insufficient model capability: Task genuinely exceeds model’s reasoning ability (rare)
Impossible task: No amount of context or prompting can solve it (very rare)

The Cost of Unsystematic Debugging

Without a framework, developers typically:

Start with low-probability fixes:

“Let me try GPT-4 instead” (fixes 10% of issues)
“Maybe I need Opus” (fixes 10% of issues)
“Let me rewrite this prompt” (fixes 25% of issues)

Ignore high-probability fixes:

“Let me include relevant files” (fixes 60% of issues) ← START HERE

Time wasted:
text
Scenario: Missing context issue (60% of cases)

Unsystematic approach:

Try different model: 15 min ✗
Rewrite prompt: 10 min ✗
Try another tool: 20 min ✗
Finally add context: 5 min ✓
Total: 50 minutes

Systematic approach:

Add context first: 5 min ✓
Total: 5 minutes

Time saved: 45 minutes (90% reduction)


## The Solution: Hierarchical Debugging Protocol

The Context Debugging Framework provides a **systematic, ordered approach** to diagnosing and fixing AI code generation issues.

**Core principle**: Debug in order of **likelihood of success**, not convenience.

### The Four-Layer Hierarchy

```text
┌─────────────────────────────────────────────────────────┐
│ Layer 1: CONTEXT (60% of issues)                       │
│ Add missing information, files, examples, architecture  │
└─────────────────────────────────────────────────────────┘
                         ↓
         Fix not found? Proceed to Layer 2
                         ↓
┌─────────────────────────────────────────────────────────┐
│ Layer 2: PROMPTING (25% of issues)                     │
│ Refine instructions, add examples, clarify constraints  │
└─────────────────────────────────────────────────────────┘
                         ↓
         Fix not found? Proceed to Layer 3
                         ↓
┌─────────────────────────────────────────────────────────┐
│ Layer 3: MODEL POWER (10% of issues)                   │
│ Escalate to more powerful model for complex reasoning   │
└─────────────────────────────────────────────────────────┘
                         ↓
         Fix not found? Proceed to Layer 4
                         ↓
┌─────────────────────────────────────────────────────────┐
│ Layer 4: MANUAL OVERRIDE (5% of issues)                │
│ Recognize when human intuition/intervention needed      │
└─────────────────────────────────────────────────────────┘

Why this order? Each layer is ordered by probability of fixing the issue. Starting with context (60% success rate) maximizes efficiency.

Layer 1: Context (60% of Issues)

Problem signature: AI produces plausible but incorrect code that doesn’t fit the codebase.

Root cause: AI lacks information about:

Existing patterns and conventions
Architecture and design decisions
Related code and dependencies
Domain-specific requirements
Error messages and stack traces

Context Debugging Checklist

When AI output is wrong, systematically add context:

1. Include Relevant Code Files

Before (AI has no context):

// Prompt: "Create a user authentication endpoint"

// AI generates (generic, doesn't fit project):
export async function login(req: Request, res: Response) {
  const { email, password } = req.body;
  const user = await User.findOne({ email });
  if (user && user.password === password) {
    res.json({ success: true, user });
  } else {
    res.status(401).json({ error: 'Invalid credentials' });
  }
}
```text

**After** (AI sees existing patterns):

```typescript
// Added context: Include existing auth endpoint as example
// File: src/api/auth/register.ts
export const registerHandler = async (input: RegisterInput): Promise<AuthResult> => {
  const validation = validateEmail(input.email);
  if (!validation.valid) {
    return { success: false, errors: validation.errors };
  }
  // ... rest of implementation
  return { success: true, user: newUser };
};

// Prompt: "Create a user authentication endpoint following the pattern in register.ts"

// AI generates (matches project patterns):
export const loginHandler = async (input: LoginInput): Promise<AuthResult> => {
  const validation = validateEmail(input.email);
  if (!validation.valid) {
    return { success: false, errors: validation.errors };
  }
  
  const user = await findUserByEmail(input.email);
  if (!user) {
    return { success: false, errors: ['Invalid credentials'] };
  }
  
  const passwordValid = await verifyPassword(input.password, user.passwordHash);
  if (!passwordValid) {
    return { success: false, errors: ['Invalid credentials'] };
  }
  
  return { success: true, user };
};

Result: Code matches existing patterns, uses correct types, follows conventions.

2. Provide System Architecture

Before:

Prompt: "Add caching to the API"

AI generates: Simple in-memory cache (doesn't scale, wrong for architecture)

After:

Context:

Our architecture:
- Next.js frontend
- tRPC API layer
- Redis for distributed caching
- PostgreSQL database
- Deployed on Vercel (serverless)

Caching requirements:
- Must work across serverless instances (no in-memory cache)
- Must integrate with existing Redis setup
- Must follow tRPC middleware pattern

Prompt: "Add caching to the API following our architecture"
```text

**AI generates**: Redis-backed cache with tRPC middleware (fits architecture perfectly)

#### 3. Include Error Messages and Stack Traces

**Before**:

Prompt: “Fix this bug where the API returns an error”

AI: “Can you provide more details about the error?”


**After**:

Context:

Error message:

TypeError: Cannot read property 'id' of undefined
  at getUserById (src/api/users.ts:23)
  at processRequest (src/api/handler.ts:45)

Code at src/api/users.ts:23:

const user = await db.users.findUnique({ where: { id: userId } });
return user.id; // Line 23 - crashes if user is null
```text

Prompt: "Fix this null reference bug"

AI generates:

const user = await db.users.findUnique({ where: { id: userId } });
if (!user) {
  throw new Error(`User ${userId} not found`);
}
return user.id;
```text

**Result**: Precise fix addressing the exact issue.

#### 4. Show Database Schemas and API Contracts

**Before**:

Prompt: “Query the users table”

AI generates: Generic SQL that doesn’t match schema


**After**:

```typescript
// Context: Include Prisma schema
model User {
  id            String   @id @default(cuid())
  email         String   @unique
  emailVerified DateTime?
  name          String?
  image         String?
  accounts      Account[]
  sessions      Session[]
  createdAt     DateTime @default(now())
  updatedAt     DateTime @updatedAt
}

// Prompt: "Query the users table to find users created in the last 7 days"

// AI generates (matches schema exactly):
const recentUsers = await prisma.user.findMany({
  where: {
    createdAt: {
      gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000),
    },
  },
  select: {
    id: true,
    email: true,
    name: true,
    createdAt: true,
  },
});

Result: Type-safe query matching exact schema.

5. Provide Examples of Expected Behavior

Before:

Prompt: "Implement input validation"

AI generates: Generic validation with throw statements (doesn't match project)

After:

// Context: Show existing validation example
// From src/lib/validation.ts:
export function validateEmail(email: string): ValidationResult {
  const errors: string[] = [];
  
  if (!email) {
    errors.push('Email is required');
  } else if (!/^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email)) {
    errors.push('Email format is invalid');
  }
  
  return {
    valid: errors.length === 0,
    errors,
  };
}

// Prompt: "Implement password validation following the validateEmail pattern"

// AI generates (matches pattern):
export function validatePassword(password: string): ValidationResult {
  const errors: string[] = [];
  
  if (!password) {
    errors.push('Password is required');
  } else if (password.length < 8) {
    errors.push('Password must be at least 8 characters');
  } else if (!/[A-Z]/.test(password)) {
    errors.push('Password must contain an uppercase letter');
  } else if (!/[0-9]/.test(password)) {
    errors.push('Password must contain a number');
  }
  
  return {
    valid: errors.length === 0,
    errors,
  };
}
```text

**Result**: Consistent validation pattern across codebase.

### How to Gather Context Efficiently

For Claude Code and similar tools:

**Use `@` mentions to include files**:

@src/api/users.ts @src/lib/database.ts

Implement createUser endpoint following patterns in users.ts


**Reference CLAUDE.md files**:

Claude Code automatically loads CLAUDE.md files from:

Root: /CLAUDE.md (project-wide patterns)
Domain: /src/api/CLAUDE.md (API-specific patterns)

Ensure these files contain:

Architecture overview
Common patterns with examples
Coding conventions
Anti-patterns to avoid


**Include relevant test files**:

@tests/api/users.test.ts

Implement createUser endpoint that passes these tests


### Context Layer Success Criteria

You've exhausted the context layer when:

- ✓ All relevant code files are included
- ✓ Architecture and design patterns are explained
- ✓ Database schemas and API contracts are provided
- ✓ Error messages and stack traces are included
- ✓ Working examples are shown
- ✓ Domain-specific requirements are stated

**If AI still produces incorrect output**, proceed to Layer 2.

## Layer 2: Prompting (25% of Issues)

**Problem signature**: AI has context but produces output that doesn't meet requirements.

**Root cause**: Instructions are ambiguous, missing edge cases, or lack clear success criteria.

### Prompting Debugging Checklist

#### 1. Add Specific Examples of Desired Output

**Before** (vague requirement):

Prompt: “Format user data for display”

AI generates: Generic JSON formatting (not what you wanted)


**After** (specific example):

Prompt: “Format user data for display

Expected output:
Input: { id: 1, email: ‘[email protected]‘, createdAt: ‘2025-01-15T10:30:00Z’ }
Output: ‘[email protected] (joined Jan 15, 2025)’

Input: { id: 2, email: ‘[email protected]‘, createdAt: ‘2024-12-01T14:20:00Z’ }
Output: ‘[email protected] (joined Dec 1, 2024)’
“

AI generates: Exact format you specified


**Result**: Unambiguous output format.

#### 2. Include Edge Cases and Constraints

**Before**:

Prompt: “Parse date from string”

AI generates: Simple parsing (doesn’t handle edge cases)


**After**:

Prompt: “Parse date from string

Edge cases to handle:

Invalid date strings: return null
Missing date: return null
Timezone handling: parse as UTC
Date format: ISO 8601 only
Out of range dates: return null

Constraints:

Never throw exceptions
Return type: Date | null
Use date-fns library (already installed)
“

AI generates: Robust parsing with all edge cases handled


**Result**: Production-ready error handling.

#### 3. Provide Clear Success Criteria

**Before**:

Prompt: “Optimize this function”

AI generates: Random micro-optimizations (unclear if successful)


**After**:

Prompt: “Optimize this function

Success criteria:

Reduce execution time from 500ms to <100ms
Maintain existing behavior (all tests pass)
Handle lists up to 10,000 items
Reduce memory allocation by 50%

Current performance:

500ms for 1,000 items
5,000ms for 10,000 items
50MB memory allocation
“

AI generates: Targeted optimizations addressing specific metrics


**Result**: Measurable improvement with clear target.

#### 4. Break Complex Tasks into Steps

**Before** (monolithic task):

Prompt: “Implement user authentication with OAuth, email verification, password reset, and 2FA”

AI generates: Incomplete or buggy implementation (task too complex)


**After** (stepwise):

Prompt: “Implement user authentication – STEP 1: Basic email/password login

Requirements for this step:

Accept email and password
Validate against database
Return JWT token on success
Return error on failure

We’ll add OAuth, email verification, and 2FA in subsequent steps.
“

AI generates: Clean, focused implementation of Step 1

[Then continue with Step 2, Step 3, etc.]


**Result**: Incremental, testable progress.

#### 5. Use Structured Formats

**Before** (narrative prompt):

Prompt: “We need a function that takes user data and validates it, making sure the email is correct and the password meets our requirements, and also checking if the username is available”

AI generates: Confused implementation mixing concerns


**After** (structured prompt):

```markdown
Prompt:

## Function Signature
```typescript
function validateUser(data: UserInput): ValidationResult

Validation Rules

Email:
- Required: Yes
- Format: Valid email regex
- Uniqueness: Check database
Password:
- Required: Yes
- Min length: 8 characters
- Must contain: uppercase, lowercase, number
Username:
- Required: Yes
- Min length: 3 characters
- Max length: 20 characters
- Characters: alphanumeric, underscore, dash only
- Uniqueness: Check database

Return Type

interface ValidationResult {
  valid: boolean;
  errors: Array<{ field: string; message: string }>;
}
```text

## Example
```typescript
validateUser({ email: 'invalid', password: 'weak', username: 'ab' })
// Returns:
// {
//   valid: false,
//   errors: [
//     { field: 'email', message: 'Invalid email format' },
//     { field: 'password', message: 'Must contain uppercase letter' },
//     { field: 'username', message: 'Must be at least 3 characters' }
//   ]
// }


AI generates: Clean, structured validation with exact interface

Result: Precise implementation matching spec.

Prompting Layer Success Criteria

You’ve exhausted the prompting layer when:

✓ Specific examples of desired output are provided
✓ All edge cases and constraints are stated
✓ Clear success criteria are defined
✓ Complex tasks are broken into steps
✓ Structured formats (tables, JSON, code blocks) are used

If AI still produces incorrect output, proceed to Layer 3.

Layer 3: Model Power (10% of Issues)

Problem signature: AI has context and clear instructions but still fails due to reasoning limitations.

Root cause: Task genuinely exceeds the model’s capabilities.

When to Escalate Model Power

Only escalate when:

Context and prompting are exhausted (Layers 1 & 2 complete)
Task requires advanced reasoning: Complex algorithms, architectural decisions, multi-step planning
Consistent failures: Same error across multiple attempts with good context

Model Escalation Strategy

Level 1: Standard Model (Claude Sonnet, GPT-4 Turbo)
↓ (if fails with good context/prompting)
Level 2: Powerful Model (Claude Opus, GPT-4)
↓ (if still fails)
Level 3: Specialized Model (Code-specific models for narrow tasks)
↓ (if still fails)
Level 4: Manual Override (human intervention needed)

Example: Complex Architecture Decision

Task: Design a real-time collaborative editing system with conflict resolution

Context & Prompting (from Layers 1 & 2):

Existing codebase patterns
Requirements and constraints
Example use cases
Success criteria

Claude Sonnet (standard model):
text
Result: Generic CRDT implementation without considering specific constraints
Issue: Lacks deep reasoning about trade-offs and edge cases


**Claude Opus** (escalated):
text
Result: Detailed analysis of CRDT vs OT approaches, considering:
- Network latency characteristics
- Database consistency requirements  
- Client-side storage constraints
- Conflict resolution strategies

Recommendation: Hybrid approach with specific implementation
Outcome: ✓ Successful

Key insight: Some tasks genuinely need more powerful reasoning.

Model Power vs. Context Trade-off

Important: Don’t use model power to compensate for missing context.

text
❌ Bad approach:
“Opus will figure it out without context” (expensive, unreliable)

✓ Good approach:
“Provide context to Sonnet first, escalate to Opus only if needed”


**Cost implications**:

text
Claude Sonnet: $3 per 1M input tokens
Claude Opus: $15 per 1M input tokens (5x more expensive)

Strategy:
1. Try Sonnet with good context (90% success, low cost)
2. Escalate to Opus only for remaining 10% (high cost, rare)

Result: Optimal cost/performance balance

Model Power Success Criteria

You’ve exhausted model power when:

✓ Standard model fails with good context/prompting
✓ Powerful model fails with same inputs
✓ Specialized models fail (if applicable)

If AI still fails, proceed to Layer 4.

Layer 4: Manual Override (5% of Issues)

Problem signature: No AI model can solve the problem, regardless of context or capability.

Root cause: Task requires human intuition, domain expertise, or is genuinely impossible.

When to Go Manual

Recognize these scenarios early:

1. Requires Deep Domain Expertise

Example: "Design a medical diagnosis algorithm for rare disease"

Why manual: Requires specialized medical knowledge, liability concerns, regulatory compliance

Approach:
- Human expert designs approach
- AI implements under expert guidance
- Expert validates all decisions

2. Requires Human Intuition or Creativity

Example: "Design a brand identity that captures the essence of our mission"

Why manual: Subjective, requires understanding of brand values, market positioning, human emotional response

Approach:
- Human designer creates concepts
- AI helps with implementation/variations
- Human makes final creative decisions

3. Ambiguous or Contradictory Requirements

Example: "Make it faster but more comprehensive" (contradiction)

Why manual: Requires stakeholder discussion to resolve trade-offs

Approach:
- Human facilitates requirements clarification
- Once clear, AI implements solution

4. Legacy System with Tribal Knowledge

Example: "Fix bug in undocumented 15-year-old Perl script"

Why manual: No context available, requires debugging archaeological dig

Approach:
- Human debugs and documents findings
- AI helps refactor once behavior is understood

Manual Override Pattern

Don’t abandon AI entirely – use hybrid approach:

1. Human: Solve core problem manually
2. Human: Document solution clearly
3. AI: Implement solution following documentation
4. Human: Review and validate
5. AI: Expand to related cases

Example:

// Human: Manually solve first instance
// Problem: Parse complex legacy log format (no documentation)

// After manual analysis:
// Log format: [TIMESTAMP][LEVEL][MODULE:SUBMODULE] message | context:value

// Human creates example parser for one case:
function parseLogLine(line: string): LogEntry {
  const timestampMatch = line.match(/^\[(.*?)\]/);
  const timestamp = timestampMatch ? timestampMatch[1] : null;
  // ... manual implementation ...
}

// AI: Now generate parser for all log levels using this pattern
// AI: Generate tests using the examples I provided
// AI: Add error handling for malformed lines
```text

**Result**: Human solves hard problem, AI scales solution.

### Manual Override Success Criteria

Go manual when:

- ✓ All 3 previous layers exhausted (Context → Prompting → Model Power)
- ✓ Task requires specialized domain expertise
- ✓ Task requires subjective human judgment
- ✓ Requirements are ambiguous or contradictory
- ✓ No documentation exists for legacy system

**Important**: Manual doesn't mean "give up on AI" - it means **human-AI hybrid approach**.

## Practical Application

### Real-World Example: API Endpoint Failure

**Scenario**: AI generates API endpoint, but it crashes in production.

**Layer 1: Context** ✓

Add context:
```typescript
// Include existing endpoint patterns
@src/api/users/list.ts
@src/lib/api-handler.ts

// Include error logs
Error: Cannot read property 'map' of undefined
  at getUsersHandler (api/users/search.ts:15)

// Include database schema
@prisma/schema.prisma

Result: AI sees the issue – needs null check before .map() call

Fix generated:

export const searchUsersHandler = async (input: SearchInput): Promise<UserList> => {
  const users = await prisma.user.findMany({
    where: { name: { contains: input.query } },
  });
  
  // Added null check
  if (!users) {
    return { users: [], total: 0 };
  }
  
  return {
    users: users.map(u => ({ id: u.id, name: u.name, email: u.email })),
    total: users.length,
  };
};
```text

**Result**: ✓ Fixed at Layer 1 (context)

**Time**: 5 minutes

---

**Alternative scenario**: What if context didn't fix it?

**Layer 2: Prompting** ✓

Refine prompt:
```markdown
Prompt: "Implement searchUsers endpoint

Success criteria:
- Return empty array (not null) when no results
- Handle null/undefined from database gracefully
- Match return type exactly: { users: User[], total: number }
- Include error handling for database failures

Edge cases:
- Empty search query → return all users (paginated)
- Special characters in query → escape for SQL safety
- Database timeout → return error result
"

Result: AI generates robust implementation with all edge cases

Result: ✓ Fixed at Layer 2 (prompting)

Time: 10 minutes total

Alternative scenario: What if prompting didn’t fix it?

Layer 3: Model Power ✓

Escalate to Claude Opus:
text
Same context + prompting, but with Opus’s stronger reasoning


Result: Opus recognizes complex interaction between pagination, filtering, and database query optimization

**Result**: ✓ Fixed at Layer 3 (model power)

**Time**: 20 minutes total

**Cost**: Higher (5x model cost), but rare

## Best Practices

### 1. Always Start with Layer 1 (Context)

**Never skip context**. It's tempting to jump to "let me try Opus" but:

```text
Context fixes: 60% (5 min avg)
Prompt fixes: 25% (10 min avg)
Model fixes: 10% (20 min avg)
Manual fixes: 5% (varies)

Expected time with proper ordering:
0.6 × 5 + 0.25 × 10 + 0.1 × 20 + 0.05 × 60 = 9.5 min avg

Expected time starting with Model (wrong order):
0.1 × 20 + 0.9 × 9.5 = 10.55 min avg (but wastes expensive model calls)

2. Don’t Waste Tokens on Genuinely Hard Problems

Key insight from the framework: Some tasks need Opus.

Don’t try to save $0.10 by using Sonnet for a task that requires Opus. You’ll waste hours and still fail.

❌ Bad: Try Sonnet 10 times (10 × $0.015 = $0.15) → still fails → 2 hours wasted

✓ Good: Recognize hard problem early → use Opus once ($0.075) → succeeds → 10 min

3. Build Context Libraries

Maintain reusable context:

# /CLAUDE.md (root)

Architecture:
- Next.js 14 with App Router
- tRPC for API layer
- Prisma + PostgreSQL
- Deployed on Vercel

Patterns:
- All API handlers return Result<T, Error> (never throw)
- Use Zod for runtime validation
- Prefer server components over client

[... more patterns ...]
text

Result: Every AI request automatically includes this context

### 4. Document Solutions for Future Context

When you solve a hard problem (Layer 3 or 4):

```markdown
# Add to CLAUDE.md

## Complex Log Parsing Pattern

When parsing legacy logs:

```typescript
// Use this pattern:
function parseLogLine(line: string): LogEntry | null {
  try {
    const [timestamp, level, module, ...rest] = line.split('|');
    return { timestamp, level, module, message: rest.join('|') };
  } catch {
    return null; // Invalid log line
  }
}

This pattern handles:

Malformed lines (return null)
Messages with | character (rejoin with |)
Missing fields (throw in try/catch)
text

Result: Next time, AI solves at Layer 1 (context) instead of Layer 4 (manual)

5. Track Success Rate by Layer

Monitor where fixes happen:

interface DebugMetrics {
  contextFixes: number;    // Layer 1
  promptFixes: number;     // Layer 2  
  modelFixes: number;      // Layer 3
  manualFixes: number;     // Layer 4
}

// Expected distribution:
// Context: 60%
// Prompting: 25%
// Model: 10%
// Manual: 5%

// If your distribution is off:
// Too many prompt fixes → improve context (CLAUDE.md, examples)
// Too many model fixes → improve prompting (clearer instructions)
// Too many manual fixes → improve context + prompting

Integration with Other Patterns

Context Debugging + Hierarchical Context Patterns

Hierarchical context patterns prevent context issues:

Without hierarchical context:
→ 60% of issues at context layer (frequent debugging)

With hierarchical context:
→ 30% of issues at context layer (reduced debugging)

See: Hierarchical Context Patterns

Context Debugging + Trust But Verify Protocol

Trust but verify protocol helps recognize Layer 4 scenarios:

When to escalate to human expert
When task needs human judgment
When requirements need clarification

See: Trust But Verify Protocol

Context Debugging + Model Switching Strategy

Model switching complements Layer 3:

Context debugging determines when to switch
Model switching determines which model to use

See: Model Switching Strategy

Common Pitfalls

❌ Pitfall 1: Skipping to Model Power

Mistake: “Let me just try Opus”

Problem: Wastes expensive model calls on context issues

Fix: Always exhaust Layer 1 (context) first

❌ Pitfall 2: Adding Irrelevant Context

Mistake: Include entire codebase “just in case”

Problem: Noise reduces signal, confuses AI

Fix: Only include relevant files and examples

❌ Pitfall 3: Vague Prompting

Mistake: “Make it better” or “Fix this”

Problem: AI has no success criteria

Fix: Provide specific examples, edge cases, constraints

❌ Pitfall 4: Persisting When Manual is Needed

Mistake: Try 20+ iterations when task needs human judgment

Problem: Wastes time and money

Fix: Recognize Layer 4 scenarios early, go hybrid

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

❌ Pitfall 5: Not Documenting Solutions

Mistake: Solve hard problem manually, don’t document

Problem: Next person hits same issue, wastes time

Fix: Add solution to CLAUDE.md for future context

Measuring Success

Metrics to Track

Fix rate by layer:

Target: Context 60%, Prompting 25%, Model 10%, Manual 5%
Actual: [measure your distribution]

Average debugging time:

Target: <10 minutes per issue
Measure: Track time from "AI output wrong" to "AI output correct"

Model escalation rate:

Target: <10% of tasks need powerful model
Measure: Sonnet usage vs Opus usage

Manual override rate:

Target: <5% of tasks need manual
Measure: Tasks where human had to implement core logic

Success Indicators

You’re using the framework effectively when:

✓ Most issues resolve at Layer 1 (context)
✓ Average debugging time is <10 minutes
✓ Rare model escalations (cost-efficient)
✓ Manual overrides are documented for future context
✓ Same issues don’t recur (learning accumulated)

Conclusion

The Context Debugging Framework provides a systematic, efficient approach to diagnosing and fixing AI code generation issues.

Key Takeaways:

Debug hierarchically: Context (60%) → Prompting (25%) → Model (10%) → Manual (5%)
Context first: Most issues stem from missing information, not model limitations
Don’t skip layers: Starting with expensive solutions wastes time and money
Document solutions: Turn Layer 3/4 fixes into Layer 1 context for next time
Recognize limitations: Some tasks need human judgment – use hybrid approach

The result: Faster debugging, lower costs, and continuous improvement as solved problems become context for future tasks.

Instead of trial-and-error debugging that wastes hours, you have a proven protocol that resolves most issues in minutes.

Related Concepts

Hierarchical Context Patterns – Prevent context issues proactively
Context Rot Auto-Compacting – Prevent context degradation in long sessions
Progressive Disclosure Context – Load context only when needed
MCP Server for Dynamic Project Context – Solve Layer 1 context issues with queryable project knowledge
Clean Slate Trajectory Recovery – Escaping bad LLM trajectories
Sliding Window History – Bounded state management for context retention
Trust But Verify Protocol – Recognize when to escalate to humans
Model Switching Strategy – Complement Layer 3 with smart model selection
Prompt Caching Strategy – Make context-heavy debugging cost-efficient
Five-Point Error Diagnostic Framework – Structured error analysis
Human-First DX Philosophy – Clear context and documentation helps humans, which directly improves AI performance
Error Messages as Training Data – ERRORS.md provides Layer 1 context for debugging
Prevention Protocol – Turn debugged issues into systematic prevention measures
Test-Based Regression Patching – Write failing tests to provide concrete verification targets

References

Claude API Documentation – Official Claude API documentation for understanding model capabilities
Prompt Engineering Guide – Comprehensive guide to effective prompting techniques

Context Debugging Framework: Systematic Problem-Solving for AI Code Generation

Summary

The Problem

The Solution

The Problem

Why This Happens

The Cost of Unsystematic Debugging

Layer 1: Context (60% of Issues)

Context Debugging Checklist

1. Include Relevant Code Files

2. Provide System Architecture

5. Provide Examples of Expected Behavior

Validation Rules

Return Type

Prompting Layer Success Criteria

Layer 3: Model Power (10% of Issues)

When to Escalate Model Power

Model Escalation Strategy

Example: Complex Architecture Decision

Model Power vs. Context Trade-off

Model Power Success Criteria

Layer 4: Manual Override (5% of Issues)

When to Go Manual

1. Requires Deep Domain Expertise

2. Requires Human Intuition or Creativity

3. Ambiguous or Contradictory Requirements

4. Legacy System with Tribal Knowledge

Manual Override Pattern

2. Don’t Waste Tokens on Genuinely Hard Problems

3. Build Context Libraries

5. Track Success Rate by Layer

Integration with Other Patterns

Context Debugging + Hierarchical Context Patterns

Context Debugging + Trust But Verify Protocol

Context Debugging + Model Switching Strategy

Common Pitfalls

❌ Pitfall 1: Skipping to Model Power

❌ Pitfall 2: Adding Irrelevant Context

❌ Pitfall 3: Vague Prompting

❌ Pitfall 4: Persisting When Manual is Needed

Learn Prompt Engineering

❌ Pitfall 5: Not Documenting Solutions

Measuring Success

Metrics to Track

Success Indicators

Conclusion

Related Concepts

References

More Insights

LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

Agent Search Observation Loop: Learning What Context to Provide