Five-Point Error Diagnostic Framework: Systematic LLM Error Reduction

James Phoenix
James Phoenix

Summary

LLM errors often seem random and unpredictable, making it difficult to diagnose and prevent recurring issues. This framework provides a systematic approach by categorizing every LLM problem into one of five root causes: Context, Model, Rules, Testing, or Quality Gates. By addressing the root cause rather than symptoms, you permanently eliminate entire classes of errors.

The Problem

When LLMs generate incorrect code, developers waste time treating symptoms instead of diagnosing root causes. Errors appear random and unpredictable. The same mistakes recur despite fixes. Without a systematic diagnostic framework, teams struggle to improve LLM reliability over time.

The Solution

Every LLM error fits into one of five categories: Context Problems (missing information), Model Problems (insufficient capability), Rules Problems (unclear guidelines), Testing Problems (weak verification), or Quality Gate Problems (insufficient automation). By systematically diagnosing which category each error belongs to and applying the corresponding fix, you permanently reduce that error class in your system.

The Problem: Random, Recurring Errors

You’re working with an LLM to build a feature. It generates code that:

  • Doesn’t follow your project’s patterns
  • Uses non-existent functions
  • Passes tests but breaks in production
  • Violates documented conventions
  • Repeats the same mistakes you just fixed

You fix each issue as it appears, but errors keep recurring. It feels random and unpredictable.

Why This Happens

Most developers treat LLM errors as individual, isolated incidents:

Error occurs → Fix the specific issue → Move on

This approach is reactive and symptom-focused. It doesn’t address why the error happened.

Result: The same error types keep appearing because the root cause was never fixed.

The Solution: Five Root Causes

Every LLM error falls into one of five categories:

  1. Context Problem: LLM lacks information to make correct decisions
  2. Model Problem: Current model lacks capability for task complexity
  3. Rules Problem: CLAUDE.md/documentation doesn’t specify behavior
  4. Testing Problem: Tests don’t catch the error type
  5. Quality Gate Problem: No automated check enforces the requirement

Key Insight: By diagnosing which category each error belongs to, you can apply systematic fixes that eliminate entire error classes permanently.

The Framework

Step 1: Diagnose Root Cause

When the LLM produces incorrect output, ask:

“Which of the five root causes is this?”

┌─────────────────────────────────────────────────────────┐
│ DIAGNOSTIC QUESTIONS                                    │
├─────────────────────────────────────────────────────────┤
│ 1. Context: Did LLM have relevant examples/patterns?   │
│ 2. Model: Is task too complex for current model?       │
│ 3. Rules: Does CLAUDE.md specify this behavior?        │
│ 4. Testing: Would better tests catch this?             │
│ 5. Quality Gates: Could automation prevent this?       │
└─────────────────────────────────────────────────────────┘

Step 2: Apply Corresponding Fix

Each root cause has specific solutions:

type RootCause = 
  | "context"      // → Inject relevant examples
  | "model"        // → Use more powerful model
  | "rules"        // → Update CLAUDE.md
  | "testing"      // → Add/improve tests
  | "quality-gate" // → Add automated checks

Step 3: Verify Fix

Re-run the original task and confirm the error is eliminated:

# Before fix
$ claude "Implement user authentication"
# → Generates code without password hashing

# After fix (added rule to CLAUDE.md)
$ claude "Implement user authentication"
# → Generates code WITH password hashing ✓

Step 4: Systematically Remove Error Class

Critical insight: Each fix doesn’t just solve one instance—it eliminates that entire error class from your system.

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course
Error occurs → Diagnose root cause → Apply systematic fix → Error class eliminated forever

Root Cause 1: Context Problems

Symptoms

  • LLM generates code that doesn’t match existing patterns
  • LLM asks questions about basic project structure
  • LLM references non-existent files or functions
  • LLM produces generic code without project-specific knowledge

Example

$ claude "Add pagination to the users API"

# LLM generates:
// ❌ Generic pagination (doesn't match project patterns)
app.get('/users', (req, res) => {
  const page = parseInt(req.query.page) || 1;
  const limit = parseInt(req.query.limit) || 10;
  // ...
});

But your project uses a standardized pagination helper:

// ✓ Project pattern
import { paginate } from './utils/pagination';

app.get('/users', async (req, res) => {
  const users = await paginate(User, req.query);
  res.json(users);
});

Diagnosis

This is a Context Problem. The LLM didn’t know about your pagination helper because it wasn’t in the context.

Solutions

  1. Inject relevant examples:

    $ claude "Add pagination to the users API. Use the same pattern as products API."
    # Include src/api/products.ts in context
    
  2. Add hierarchical CLAUDE.md files:

    <!-- src/api/CLAUDE.md -->
    # API Patterns
    
    ## Pagination
    
    Always use `paginate()` helper from `utils/pagination.ts`:
    
    ```typescript
    import { paginate } from '../utils/pagination';
    
    app.get('/endpoint', async (req, res) => {
      const results = await paginate(Model, req.query);
      res.json(results);
    });
    
  3. Reference existing implementations:

    $ claude "Add pagination following the pattern in src/api/products.ts"
    

Result

Permanent fix: Once pagination patterns are documented in src/api/CLAUDE.md, the LLM will always use the correct pattern for API pagination.

Root Cause 2: Model Problems

Symptoms

  • LLM fails on complex architecture decisions
  • LLM struggles with multi-step reasoning
  • LLM generates incomplete solutions
  • LLM makes the same mistakes repeatedly despite context

Example

$ claude "Refactor the auth system to support OAuth, SAML, and JWT simultaneously"

# LLM generates incomplete solution:
# - Only implements OAuth
# - Doesn't handle provider switching
# - Missing integration with existing session management

Diagnosis

This is a Model Problem. The task requires complex architectural reasoning that the current model struggles with.

Solutions

  1. Switch to more powerful model:

    # If using Claude Sonnet
    $ claude --model opus "Refactor auth system..."
    
    # If using GPT-3.5
    $ claude --model gpt-4 "Refactor auth system..."
    
  2. Break complex task into smaller steps:

    # Instead of one large task:
    $ claude "Step 1: Design auth provider interface"
    $ claude "Step 2: Implement OAuth provider"
    $ claude "Step 3: Implement SAML provider"
    $ claude "Step 4: Add provider switching logic"
    
  3. Use chain-of-thought prompting:

    $ claude "Before implementing, first:
    1. Analyze existing auth system
    2. Design provider interface
    3. Plan migration strategy
    Then implement the refactoring"
    

Result

Permanent fix: Document in your project’s CLAUDE.md which model to use for complex tasks:

<!-- CLAUDE.md -->
# Model Selection Guidelines

- **Simple tasks** (< 50 lines): Claude Sonnet
- **Complex refactoring** (> 100 lines): Claude Opus
- **Architecture design**: Claude Opus with chain-of-thought

Root Cause 3: Rules Problems

Symptoms

  • LLM violates documented patterns (despite having context)
  • LLM doesn’t handle edge cases correctly
  • LLM repeats the same pattern violations
  • LLM generates code that breaks project conventions

Example

$ claude "Add endpoint to delete user accounts"

# LLM generates:
// ❌ Hard delete (violates project rules)
app.delete('/users/:id', async (req, res) => {
  await User.destroy({ where: { id: req.params.id } });
  res.json({ success: true });
});

But your project has a strict “soft delete only” policy (for audit compliance).

Diagnosis

This is a Rules Problem. The LLM had context but didn’t know about the soft delete requirement.

Solutions

  1. Add specific rules to CLAUDE.md:

    <!-- CLAUDE.md -->
    # Data Deletion Rules
    
    **NEVER use hard deletes.** Always use soft deletes:
    
    ```typescript
    // ❌ NEVER do this
    await Model.destroy({ where: { id } });
    
    // ✓ ALWAYS do this
    await Model.update({ deletedAt: new Date() }, { where: { id } });
    

    Rationale: Audit compliance requires data retention.

  2. Document edge cases with examples:

    ## Edge Cases
    
    ### Deleting User with Active Subscriptions
    
    Before soft-deleting user:
    1. Cancel all active subscriptions
    2. Send cancellation emails
    3. Log deletion reason
    4. Then soft-delete user record
    
  3. Add “DO NOT” anti-patterns:

    ## Anti-Patterns**DO NOT** use `.destroy()`**DO NOT** use `DELETE FROM` in raw SQL
    ❌ **DO NOT** permanently remove user data
    
    ✓ **DO** use `.update({ deletedAt })`**DO** check for related records first
    ✓ **DO** log deletion with reason
    

Result

Permanent fix: Once soft delete rules are in CLAUDE.md, the LLM will never generate hard deletes again.

Root Cause 4: Testing Problems

Symptoms

  • Code passes tests but doesn’t work correctly in production
  • Tests focus on happy path only
  • Tests check for presence, not behavior
  • LLM generates code that satisfies weak tests

Example

$ claude "Add input validation to user registration"

# LLM generates code that passes this weak test:
// ❌ Weak test (checks presence, not behavior)
test('validates email', () => {
  const result = validateEmail('invalid');
  expect(result).toBeDefined(); // Just checks it returns something
});

The code “validates” by always returning true:

// ❌ Passes weak test but doesn't actually validate
function validateEmail(email: string): boolean {
  return true; // Always passes!
}

Diagnosis

This is a Testing Problem. The test is too weak to catch invalid implementations.

Solutions

  1. Test behavior, not presence:

    // ✓ Strong test (checks actual behavior)
    describe('validateEmail', () => {
      test('accepts valid emails', () => {
        expect(validateEmail('[email protected]')).toBe(true);
        expect(validateEmail('[email protected]')).toBe(true);
      });
      
      test('rejects invalid emails', () => {
        expect(validateEmail('invalid')).toBe(false);
        expect(validateEmail('@example.com')).toBe(false);
        expect(validateEmail('user@')).toBe(false);
        expect(validateEmail('')).toBe(false);
      });
    });
    
  2. Test edge cases and error scenarios:

    test('handles special characters', () => {
      expect(validateEmail('[email protected]')).toBe(true);
    });
    
    test('handles very long emails', () => {
      const longEmail = 'a'.repeat(64) + '@example.com';
      expect(validateEmail(longEmail)).toBe(true);
    });
    
  3. Focus on integration, not units:

    // Instead of testing validateEmail() in isolation,
    // test the whole registration flow:
    test('registration rejects invalid email', async () => {
      const response = await request(app)
        .post('/register')
        .send({ email: 'invalid', password: 'secure123' });
      
      expect(response.status).toBe(400);
      expect(response.body.error).toContain('Invalid email');
    });
    

Result

Permanent fix: Strong tests force the LLM to generate correct implementations. Weak tests allow broken code to pass.

Root Cause 5: Quality Gate Problems

Symptoms

  • Code compiles but violates architectural patterns
  • Linting passes but conventions are broken
  • Code works but doesn’t follow project standards
  • Subtle bugs slip through (null checks, type errors, etc.)

Example

$ claude "Add user profile update endpoint"

# LLM generates:
// ❌ Compiles and lints, but violates architecture
app.put('/users/:id', async (req, res) => {
  // Direct database access in route handler (bad!)
  const user = await db.query('UPDATE users SET ... WHERE id = ?', [req.params.id]);
  res.json(user);
});

Your architecture requires service layer separation, but there’s no automated check for this.

Diagnosis

This is a Quality Gate Problem. The code passes TypeScript and ESLint, but violates architectural rules that aren’t automated.

Solutions

  1. Add custom ESLint rules:

    // .eslintrc.js
    module.exports = {
      rules: {
        'no-restricted-imports': ['error', {
          patterns: [{
            group: ['**/db', '**/database'],
            message: 'Import services, not db directly in routes'
          }]
        }]
      }
    };
    
  2. Use AST-grep for pattern enforcement:

    # .ast-grep/rules/no-db-in-routes.yml
    id: no-db-in-routes
    language: typescript
    rule:
      pattern: |
        app.$METHOD($PATH, async ($REQ, $RES) => {
          $$$
          await db.$CALL($$$)
          $$$
        })
      message: "Don't access db directly in route handlers. Use services."
    
  3. Add pre-commit hooks:

    # .husky/pre-commit
    #!/bin/bash
    npm run type-check
    npm run lint
    npm run test
    ast-grep scan  # Architectural pattern checks
    
  4. Enable stricter TypeScript:

    // tsconfig.json
    {
      "compilerOptions": {
        "strict": true,
        "noImplicitAny": true,
        "strictNullChecks": true,
        "noUncheckedIndexedAccess": true
      }
    }
    

Result

Permanent fix: Once automated checks exist, the LLM can’t generate code that violates patterns (hooks will block it).

Real-World Workflow

Scenario: Building a payment processing feature

# Initial attempt
$ claude "Add Stripe payment processing"

# Error 1: LLM uses wrong API version

Diagnosis: Context Problem – LLM doesn’t know which Stripe API version you use

Fix: Add to CLAUDE.md:

# Payment Processing

Use Stripe API v2023-10-16:
```typescript
import Stripe from 'stripe';
const stripe = new Stripe(process.env.STRIPE_KEY, {
  apiVersion: '2023-10-16'
});

**Result**: LLM now always uses correct API version 

---

```bash
# Second attempt
$ claude "Add Stripe payment processing"

# Error 2: LLM doesn't handle webhook signature verification

Diagnosis: Rules Problem – CLAUDE.md doesn’t specify webhook security

Fix: Add to CLAUDE.md:

## Webhook Security

**ALWAYS verify webhook signatures:**

```typescript
app.post('/webhooks/stripe', (req, res) => {
  const sig = req.headers['stripe-signature'];
  
  // ✓ REQUIRED: Verify signature
  const event = stripe.webhooks.constructEvent(
    req.body,
    sig,
    process.env.STRIPE_WEBHOOK_SECRET
  );
  
  // Process event...
});

**Result**: LLM now always verifies webhook signatures 

---

```bash
# Third attempt
$ claude "Add Stripe payment processing"

# Error 3: Code passes tests but fails in production (idempotency)

Diagnosis: Testing Problem – Tests don’t check idempotency

Fix: Add integration test:

test('payment processing is idempotent', async () => {
  const idempotencyKey = 'test-key-123';
  
  // First charge
  const charge1 = await processPayment({
    amount: 1000,
    idempotencyKey
  });
  
  // Second charge with same key (should return same result)
  const charge2 = await processPayment({
    amount: 1000,
    idempotencyKey
  });
  
  expect(charge1.id).toBe(charge2.id); // Same charge
});

Result: LLM now generates idempotent payment code ✓


# Fourth attempt
$ claude "Add Stripe payment processing"

# Error 4: LLM logs sensitive card data

Diagnosis: Quality Gate Problem – No automated check for PCI compliance

Fix: Add AST-grep rule:

id: no-log-card-data
language: typescript
rule:
  pattern: |
    logger.$METHOD($$$, $CARD, $$$)
  where:
    CARD:
      regex: "card|payment|stripe"
message: "Never log card/payment data (PCI compliance)"

Result: Pre-commit hook blocks code that logs sensitive data ✓


Final State

After four iterations, you’ve systematically eliminated four error classes:

  1. Context: Stripe patterns documented
  2. Rules: Security requirements specified
  3. Testing: Idempotency verified
  4. Quality Gates: PCI compliance automated

Future payment features: LLM will never make these four mistake types again.

Best Practices

1. Diagnose Before Fixing

# ❌ React immediately
$ claude "Fix the bug in auth.ts"

# ✓ Diagnose first
# Ask: "Is this Context, Model, Rules, Testing, or Quality Gate?"
# Then apply systematic fix

2. Fix Root Cause, Not Symptom

# ❌ Symptom fix
$ claude "Change line 47 to use bcrypt.hash instead of plain text"

# ✓ Root cause fix
# Add to CLAUDE.md:
# "ALWAYS hash passwords with bcrypt before storing"
# Now ALL future password code will be correct

3. Document Every Fix

Maintain a log of diagnosed issues:

<!-- PROJECT_LOG.md -->
# Error Diagnosis Log

## 2025-11-01: Password Hashing

- **Error**: LLM stored passwords in plain text
- **Root Cause**: Rules Problem
- **Fix**: Added password security rules to CLAUDE.md
- **Result**: All future auth code includes hashing ✓

## 2025-10-28: Pagination Pattern

- **Error**: LLM used custom pagination instead of helper
- **Root Cause**: Context Problem  
- **Fix**: Added pagination examples to src/api/CLAUDE.md
- **Result**: All future API endpoints use standard helper ✓

This log helps you:

  • Track improvement over time
  • Identify recurring patterns
  • Share learnings with team

4. Apply Multiple Fixes When Needed

Some errors have multiple root causes:

Error: LLM generates SQL injection vulnerability

Root Causes:
1. Rules Problem: CLAUDE.md doesn't prohibit raw SQL
2. Testing Problem: No security tests
3. Quality Gate Problem: No SQL injection scanner

Fixes:
1. Add rule: "NEVER use raw SQL, always use ORM"
2. Add test: Check all inputs are parameterized
3. Add hook: Run SQL injection scanner pre-commit

Result: Triple-layer defense against SQL injection.

5. Measure Improvement

Track error reduction over time:

Week 1: 15 errors requiring fixes
Week 2: 12 errors (3 error classes eliminated)
Week 3: 7 errors (5 more classes eliminated)
Week 4: 3 errors (mostly edge cases)

Goal: Exponential error reduction as systematic fixes compound.

When NOT to Use

This framework is overkill for:

❌ One-off scripts

# Throwaway data migration script
$ claude "Write script to migrate users table"
# Just fix inline, don't systematize

❌ Exploration/prototyping

# Trying different approaches
$ claude "Show me 3 ways to implement caching"
# No need for systematic fixes during exploration

❌ Simple, isolated bugs

# Typo in variable name
$ claude "Fix typo: `usre` should be `user`"
# Simple one-off fix

✓ Always use for production code

Apply systematic diagnosis for:

  • Feature development
  • Bug fixes in core systems
  • Refactoring existing code
  • Adding new API endpoints
  • Security-critical code

Common Pitfalls

Pitfall 1: Fixing Symptoms Instead of Root Causes

# ❌ Symptom-focused
$ claude "Change this line to add password hashing"
# Only fixes one instance

# ✓ Root cause-focused  
# Add rule to CLAUDE.md about password security
# Fixes ALL future instances

Pitfall 2: Assuming Single Root Cause

Error: Authentication tokens expire too quickly

Don't assume: "This is just a Rules Problem"

Actually multiple causes:
1. Rules: Token lifetime not specified
2. Testing: No tests for token expiration
3. Context: Didn't know existing token patterns

Solution: Check all five categories systematically.

Pitfall 3: Not Verifying Fixes

# ❌ Apply fix without verification
# Add rule to CLAUDE.md
# Assume it's fixed

# ✓ Verify fix works
# Add rule to CLAUDE.md
$ claude "Implement user authentication" # Re-run original task
# Confirm password hashing is now included ✓

Pitfall 4: Over-relying on One Fix Type

Team only adds rules to CLAUDE.md
 CLAUDE.md becomes huge and unwieldy
 LLM can't fit all rules in context

Better approach:
- Rules: High-level patterns and anti-patterns
- Quality Gates: Automate what can be checked
- Testing: Verify behavior programmatically
- Context: Provide examples, not rules

Balance across all five categories.

Measuring Success

Key Metrics

  1. Error recurrence rate

    • % of errors that recur after “fixing”
    • Target: < 5% (systematic fixes prevent recurrence)
  2. Time to diagnose

    • Minutes spent identifying root cause
    • Target: < 5 minutes (framework makes diagnosis fast)
  3. Error reduction trend

    • Weekly count of LLM errors requiring fixes
    • Target: Exponential decrease over time
  4. Fix durability

    • How long fixes remain effective
    • Target: Permanent (error class eliminated)

Success Indicators

✓ Errors become predictable (fit into five categories)
✓ Same error type never recurs after systematic fix
✓ Team shares fix patterns (documented in CLAUDE.md)
✓ New team members benefit from accumulated fixes
✓ LLM reliability improves over time without model changes

Conclusion

The Five-Point Error Diagnostic Framework transforms LLM error handling from reactive symptom-fixing to proactive root cause elimination.

Core Principle: Every LLM error fits into one of five categories:

  1. Context: Missing information
  2. Model: Insufficient capability
  3. Rules: Unclear guidelines
  4. Testing: Weak verification
  5. Quality Gates: Insufficient automation

Systematic Process:

  1. Diagnose which of the five root causes
  2. Apply corresponding systematic fix
  3. Verify error is eliminated
  4. Document fix for future reference

Result: Each fix permanently eliminates that error class, leading to exponential improvement in LLM reliability.

Key Insight: Stop treating LLM errors as random incidents. They’re systematic problems with systematic solutions.

Related Concepts

References

Topics
Claude MdContext ManagementDebuggingError DiagnosisMeta DevelopmentModel SelectionQuality GatesRoot Cause AnalysisSystematic DebuggingTesting Strategy

More Insights

Cover Image for Thought Leaders

Thought Leaders

People to follow for compound engineering, context engineering, and AI agent development.

James Phoenix
James Phoenix
Cover Image for Systems Thinking & Observability

Systems Thinking & Observability

Software should be treated as a measurable dynamical system, not as a collection of features.

James Phoenix
James Phoenix