Five-Point Error Diagnostic Framework: Systematic LLM Error Reduction

James Phoenix

Summary

LLM errors often seem random and unpredictable, making it difficult to diagnose and prevent recurring issues. This framework provides a systematic approach by categorizing every LLM problem into one of five root causes: Context, Model, Rules, Testing, or Quality Gates. By addressing the root cause rather than symptoms, you permanently eliminate entire classes of errors.

The Problem

When LLMs generate incorrect code, developers waste time treating symptoms instead of diagnosing root causes. Errors appear random and unpredictable. The same mistakes recur despite fixes. Without a systematic diagnostic framework, teams struggle to improve LLM reliability over time.

The Solution

Every LLM error fits into one of five categories: Context Problems (missing information), Model Problems (insufficient capability), Rules Problems (unclear guidelines), Testing Problems (weak verification), or Quality Gate Problems (insufficient automation). By systematically diagnosing which category each error belongs to and applying the corresponding fix, you permanently reduce that error class in your system.

The Problem: Random, Recurring Errors

You’re working with an LLM to build a feature. It generates code that:

Doesn’t follow your project’s patterns
Uses non-existent functions
Passes tests but breaks in production
Violates documented conventions
Repeats the same mistakes you just fixed

You fix each issue as it appears, but errors keep recurring. It feels random and unpredictable.

Why This Happens

Most developers treat LLM errors as individual, isolated incidents:

Error occurs → Fix the specific issue → Move on

This approach is reactive and symptom-focused. It doesn’t address why the error happened.

Result: The same error types keep appearing because the root cause was never fixed.

The Solution: Five Root Causes

Every LLM error falls into one of five categories:

Context Problem: LLM lacks information to make correct decisions
Model Problem: Current model lacks capability for task complexity
Rules Problem: CLAUDE.md/documentation doesn’t specify behavior
Testing Problem: Tests don’t catch the error type
Quality Gate Problem: No automated check enforces the requirement

Key Insight: By diagnosing which category each error belongs to, you can apply systematic fixes that eliminate entire error classes permanently.

The Framework

Step 1: Diagnose Root Cause

When the LLM produces incorrect output, ask:

“Which of the five root causes is this?”

┌─────────────────────────────────────────────────────────┐
│ DIAGNOSTIC QUESTIONS                                    │
├─────────────────────────────────────────────────────────┤
│ 1. Context: Did LLM have relevant examples/patterns?   │
│ 2. Model: Is task too complex for current model?       │
│ 3. Rules: Does CLAUDE.md specify this behavior?        │
│ 4. Testing: Would better tests catch this?             │
│ 5. Quality Gates: Could automation prevent this?       │
└─────────────────────────────────────────────────────────┘

Step 2: Apply Corresponding Fix

Each root cause has specific solutions:

type RootCause = 
  | "context"      // → Inject relevant examples
  | "model"        // → Use more powerful model
  | "rules"        // → Update CLAUDE.md
  | "testing"      // → Add/improve tests
  | "quality-gate" // → Add automated checks

Step 3: Verify Fix

Re-run the original task and confirm the error is eliminated:

# Before fix
$ claude "Implement user authentication"
# → Generates code without password hashing

# After fix (added rule to CLAUDE.md)
$ claude "Implement user authentication"
# → Generates code WITH password hashing ✓

Step 4: Systematically Remove Error Class

Critical insight: Each fix doesn’t just solve one instance—it eliminates that entire error class from your system.

Error occurs → Diagnose root cause → Apply systematic fix → Error class eliminated forever

Root Cause 1: Context Problems

Symptoms

LLM generates code that doesn’t match existing patterns
LLM asks questions about basic project structure
LLM references non-existent files or functions
LLM produces generic code without project-specific knowledge

Example

$ claude "Add pagination to the users API"

# LLM generates:

// ❌ Generic pagination (doesn't match project patterns)
app.get('/users', (req, res) => {
  const page = parseInt(req.query.page) || 1;
  const limit = parseInt(req.query.limit) || 10;
  // ...
});

But your project uses a standardized pagination helper:

// ✓ Project pattern
import { paginate } from './utils/pagination';

app.get('/users', async (req, res) => {
  const users = await paginate(User, req.query);
  res.json(users);
});

Diagnosis

This is a Context Problem. The LLM didn’t know about your pagination helper because it wasn’t in the context.

Solutions

Inject relevant examples:

$ claude "Add pagination to the users API. Use the same pattern as products API."
# Include src/api/products.ts in context

Add hierarchical CLAUDE.md files:

<!-- src/api/CLAUDE.md -->
# API Patterns

## Pagination

Always use `paginate()` helper from `utils/pagination.ts`:

```typescript
import { paginate } from '../utils/pagination';

app.get('/endpoint', async (req, res) => {
  const results = await paginate(Model, req.query);
  res.json(results);
});

Reference existing implementations:

$ claude "Add pagination following the pattern in src/api/products.ts"

Result

Permanent fix: Once pagination patterns are documented in src/api/CLAUDE.md, the LLM will always use the correct pattern for API pagination.

Root Cause 2: Model Problems

Symptoms

LLM fails on complex architecture decisions
LLM struggles with multi-step reasoning
LLM generates incomplete solutions
LLM makes the same mistakes repeatedly despite context

Example

$ claude "Refactor the auth system to support OAuth, SAML, and JWT simultaneously"

# LLM generates incomplete solution:
# - Only implements OAuth
# - Doesn't handle provider switching
# - Missing integration with existing session management

Diagnosis

This is a Model Problem. The task requires complex architectural reasoning that the current model struggles with.

Solutions

Switch to more powerful model:

# If using Claude Sonnet
$ claude --model opus "Refactor auth system..."

# If using GPT-3.5
$ claude --model gpt-4 "Refactor auth system..."

Break complex task into smaller steps:

# Instead of one large task:
$ claude "Step 1: Design auth provider interface"
$ claude "Step 2: Implement OAuth provider"
$ claude "Step 3: Implement SAML provider"
$ claude "Step 4: Add provider switching logic"

Use chain-of-thought prompting:

$ claude "Before implementing, first:
1. Analyze existing auth system
2. Design provider interface
3. Plan migration strategy
Then implement the refactoring"

Result

Permanent fix: Document in your project’s CLAUDE.md which model to use for complex tasks:

<!-- CLAUDE.md -->
# Model Selection Guidelines

- **Simple tasks** (< 50 lines): Claude Sonnet
- **Complex refactoring** (> 100 lines): Claude Opus
- **Architecture design**: Claude Opus with chain-of-thought

Root Cause 3: Rules Problems

Symptoms

LLM violates documented patterns (despite having context)
LLM doesn’t handle edge cases correctly
LLM repeats the same pattern violations
LLM generates code that breaks project conventions

Example

$ claude "Add endpoint to delete user accounts"

# LLM generates:

// ❌ Hard delete (violates project rules)
app.delete('/users/:id', async (req, res) => {
  await User.destroy({ where: { id: req.params.id } });
  res.json({ success: true });
});

But your project has a strict “soft delete only” policy (for audit compliance).

Diagnosis

This is a Rules Problem. The LLM had context but didn’t know about the soft delete requirement.

Solutions

Add specific rules to CLAUDE.md:

<!-- CLAUDE.md -->
# Data Deletion Rules

**NEVER use hard deletes.** Always use soft deletes:

```typescript
// ❌ NEVER do this
await Model.destroy({ where: { id } });

// ✓ ALWAYS do this
await Model.update({ deletedAt: new Date() }, { where: { id } });

Rationale: Audit compliance requires data retention.

Document edge cases with examples:

## Edge Cases

### Deleting User with Active Subscriptions

Before soft-deleting user:
1. Cancel all active subscriptions
2. Send cancellation emails
3. Log deletion reason
4. Then soft-delete user record

Add “DO NOT” anti-patterns:

## Anti-Patterns

❌ **DO NOT** use `.destroy()`
❌ **DO NOT** use `DELETE FROM` in raw SQL
❌ **DO NOT** permanently remove user data

✓ **DO** use `.update({ deletedAt })`
✓ **DO** check for related records first
✓ **DO** log deletion with reason

Result

Permanent fix: Once soft delete rules are in CLAUDE.md, the LLM will never generate hard deletes again.

Root Cause 4: Testing Problems

Symptoms

Code passes tests but doesn’t work correctly in production
Tests focus on happy path only
Tests check for presence, not behavior
LLM generates code that satisfies weak tests

Example

$ claude "Add input validation to user registration"

# LLM generates code that passes this weak test:

// ❌ Weak test (checks presence, not behavior)
test('validates email', () => {
  const result = validateEmail('invalid');
  expect(result).toBeDefined(); // Just checks it returns something
});

The code “validates” by always returning true:

// ❌ Passes weak test but doesn't actually validate
function validateEmail(email: string): boolean {
  return true; // Always passes!
}

Diagnosis

This is a Testing Problem. The test is too weak to catch invalid implementations.

Solutions

Test behavior, not presence:

// ✓ Strong test (checks actual behavior)
describe('validateEmail', () => {
  test('accepts valid emails', () => {
    expect(validateEmail('[email protected]')).toBe(true);
    expect(validateEmail('[email protected]')).toBe(true);
  });
  
  test('rejects invalid emails', () => {
    expect(validateEmail('invalid')).toBe(false);
    expect(validateEmail('@example.com')).toBe(false);
    expect(validateEmail('user@')).toBe(false);
    expect(validateEmail('')).toBe(false);
  });
});

Test edge cases and error scenarios:

test('handles special characters', () => {
  expect(validateEmail('[email protected]')).toBe(true);
});

test('handles very long emails', () => {
  const longEmail = 'a'.repeat(64) + '@example.com';
  expect(validateEmail(longEmail)).toBe(true);
});

Focus on integration, not units:

// Instead of testing validateEmail() in isolation,
// test the whole registration flow:
test('registration rejects invalid email', async () => {
  const response = await request(app)
    .post('/register')
    .send({ email: 'invalid', password: 'secure123' });
  
  expect(response.status).toBe(400);
  expect(response.body.error).toContain('Invalid email');
});

Result

Permanent fix: Strong tests force the LLM to generate correct implementations. Weak tests allow broken code to pass.

Root Cause 5: Quality Gate Problems

Symptoms

Code compiles but violates architectural patterns
Linting passes but conventions are broken
Code works but doesn’t follow project standards
Subtle bugs slip through (null checks, type errors, etc.)

Example

$ claude "Add user profile update endpoint"

# LLM generates:

// ❌ Compiles and lints, but violates architecture
app.put('/users/:id', async (req, res) => {
  // Direct database access in route handler (bad!)
  const user = await db.query('UPDATE users SET ... WHERE id = ?', [req.params.id]);
  res.json(user);
});

Your architecture requires service layer separation, but there’s no automated check for this.

Diagnosis

This is a Quality Gate Problem. The code passes TypeScript and ESLint, but violates architectural rules that aren’t automated.

Solutions

Add custom ESLint rules:

// .eslintrc.js
module.exports = {
  rules: {
    'no-restricted-imports': ['error', {
      patterns: [{
        group: ['**/db', '**/database'],
        message: 'Import services, not db directly in routes'
      }]
    }]
  }
};

Use AST-grep for pattern enforcement:

# .ast-grep/rules/no-db-in-routes.yml
id: no-db-in-routes
language: typescript
rule:
  pattern: |
    app.$METHOD($PATH, async ($REQ, $RES) => {
      $$$
      await db.$CALL($$$)
      $$$
    })
  message: "Don't access db directly in route handlers. Use services."

Add pre-commit hooks:

# .husky/pre-commit
#!/bin/bash
npm run type-check
npm run lint
npm run test
ast-grep scan  # Architectural pattern checks

Enable stricter TypeScript:

// tsconfig.json
{
  "compilerOptions": {
    "strict": true,
    "noImplicitAny": true,
    "strictNullChecks": true,
    "noUncheckedIndexedAccess": true
  }
}

Result

Permanent fix: Once automated checks exist, the LLM can’t generate code that violates patterns (hooks will block it).

Real-World Workflow

Scenario: Building a payment processing feature

# Initial attempt
$ claude "Add Stripe payment processing"

# Error 1: LLM uses wrong API version

Diagnosis: Context Problem – LLM doesn’t know which Stripe API version you use

Fix: Add to CLAUDE.md:

# Payment Processing

Use Stripe API v2023-10-16:
```typescript
import Stripe from 'stripe';
const stripe = new Stripe(process.env.STRIPE_KEY, {
  apiVersion: '2023-10-16'
});


**Result**: LLM now always uses correct API version ✓

---

```bash
# Second attempt
$ claude "Add Stripe payment processing"

# Error 2: LLM doesn't handle webhook signature verification

Diagnosis: Rules Problem – CLAUDE.md doesn’t specify webhook security

Fix: Add to CLAUDE.md:

## Webhook Security

**ALWAYS verify webhook signatures:**

```typescript
app.post('/webhooks/stripe', (req, res) => {
  const sig = req.headers['stripe-signature'];
  
  // ✓ REQUIRED: Verify signature
  const event = stripe.webhooks.constructEvent(
    req.body,
    sig,
    process.env.STRIPE_WEBHOOK_SECRET
  );
  
  // Process event...
});


**Result**: LLM now always verifies webhook signatures ✓

---

```bash
# Third attempt
$ claude "Add Stripe payment processing"

# Error 3: Code passes tests but fails in production (idempotency)

Diagnosis: Testing Problem – Tests don’t check idempotency

Fix: Add integration test:

test('payment processing is idempotent', async () => {
  const idempotencyKey = 'test-key-123';
  
  // First charge
  const charge1 = await processPayment({
    amount: 1000,
    idempotencyKey
  });
  
  // Second charge with same key (should return same result)
  const charge2 = await processPayment({
    amount: 1000,
    idempotencyKey
  });
  
  expect(charge1.id).toBe(charge2.id); // Same charge
});

Result: LLM now generates idempotent payment code ✓

# Fourth attempt
$ claude "Add Stripe payment processing"

# Error 4: LLM logs sensitive card data

Diagnosis: Quality Gate Problem – No automated check for PCI compliance

Fix: Add AST-grep rule:

id: no-log-card-data
language: typescript
rule:
  pattern: |
    logger.$METHOD($$$, $CARD, $$$)
  where:
    CARD:
      regex: "card|payment|stripe"
message: "Never log card/payment data (PCI compliance)"

Result: Pre-commit hook blocks code that logs sensitive data ✓

Final State

After four iterations, you’ve systematically eliminated four error classes:

✓ Context: Stripe patterns documented
✓ Rules: Security requirements specified
✓ Testing: Idempotency verified
✓ Quality Gates: PCI compliance automated

Future payment features: LLM will never make these four mistake types again.

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

Best Practices

1. Diagnose Before Fixing

# ❌ React immediately
$ claude "Fix the bug in auth.ts"

# ✓ Diagnose first
# Ask: "Is this Context, Model, Rules, Testing, or Quality Gate?"
# Then apply systematic fix

2. Fix Root Cause, Not Symptom

# ❌ Symptom fix
$ claude "Change line 47 to use bcrypt.hash instead of plain text"

# ✓ Root cause fix
# Add to CLAUDE.md:
# "ALWAYS hash passwords with bcrypt before storing"
# Now ALL future password code will be correct

3. Document Every Fix

Maintain a log of diagnosed issues:

<!-- PROJECT_LOG.md -->
# Error Diagnosis Log

## 2025-11-01: Password Hashing

- **Error**: LLM stored passwords in plain text
- **Root Cause**: Rules Problem
- **Fix**: Added password security rules to CLAUDE.md
- **Result**: All future auth code includes hashing ✓

## 2025-10-28: Pagination Pattern

- **Error**: LLM used custom pagination instead of helper
- **Root Cause**: Context Problem  
- **Fix**: Added pagination examples to src/api/CLAUDE.md
- **Result**: All future API endpoints use standard helper ✓

This log helps you:

Track improvement over time
Identify recurring patterns
Share learnings with team

4. Apply Multiple Fixes When Needed

Some errors have multiple root causes:

Error: LLM generates SQL injection vulnerability

Root Causes:
1. Rules Problem: CLAUDE.md doesn't prohibit raw SQL
2. Testing Problem: No security tests
3. Quality Gate Problem: No SQL injection scanner

Fixes:
1. Add rule: "NEVER use raw SQL, always use ORM"
2. Add test: Check all inputs are parameterized
3. Add hook: Run SQL injection scanner pre-commit

Result: Triple-layer defense against SQL injection.

5. Measure Improvement

Track error reduction over time:

Week 1: 15 errors requiring fixes
Week 2: 12 errors (3 error classes eliminated)
Week 3: 7 errors (5 more classes eliminated)
Week 4: 3 errors (mostly edge cases)

Goal: Exponential error reduction as systematic fixes compound.

When NOT to Use

This framework is overkill for:

❌ One-off scripts

# Throwaway data migration script
$ claude "Write script to migrate users table"
# Just fix inline, don't systematize

❌ Exploration/prototyping

# Trying different approaches
$ claude "Show me 3 ways to implement caching"
# No need for systematic fixes during exploration

❌ Simple, isolated bugs

# Typo in variable name
$ claude "Fix typo: `usre` should be `user`"
# Simple one-off fix

✓ Always use for production code

Apply systematic diagnosis for:

Feature development
Bug fixes in core systems
Refactoring existing code
Adding new API endpoints
Security-critical code

Common Pitfalls

Pitfall 1: Fixing Symptoms Instead of Root Causes

# ❌ Symptom-focused
$ claude "Change this line to add password hashing"
# Only fixes one instance

# ✓ Root cause-focused  
# Add rule to CLAUDE.md about password security
# Fixes ALL future instances

Pitfall 2: Assuming Single Root Cause

Error: Authentication tokens expire too quickly

Don't assume: "This is just a Rules Problem"

Actually multiple causes:
1. Rules: Token lifetime not specified
2. Testing: No tests for token expiration
3. Context: Didn't know existing token patterns

Solution: Check all five categories systematically.

Pitfall 3: Not Verifying Fixes

# ❌ Apply fix without verification
# Add rule to CLAUDE.md
# Assume it's fixed

# ✓ Verify fix works
# Add rule to CLAUDE.md
$ claude "Implement user authentication" # Re-run original task
# Confirm password hashing is now included ✓

Pitfall 4: Over-relying on One Fix Type

Team only adds rules to CLAUDE.md
→ CLAUDE.md becomes huge and unwieldy
→ LLM can't fit all rules in context

Better approach:
- Rules: High-level patterns and anti-patterns
- Quality Gates: Automate what can be checked
- Testing: Verify behavior programmatically
- Context: Provide examples, not rules

Balance across all five categories.

Measuring Success

Key Metrics

Error recurrence rate
- % of errors that recur after “fixing”
- Target: < 5% (systematic fixes prevent recurrence)
Time to diagnose
- Minutes spent identifying root cause
- Target: < 5 minutes (framework makes diagnosis fast)
Error reduction trend
- Weekly count of LLM errors requiring fixes
- Target: Exponential decrease over time
Fix durability
- How long fixes remain effective
- Target: Permanent (error class eliminated)

Success Indicators

✓ Errors become predictable (fit into five categories)
✓ Same error type never recurs after systematic fix
✓ Team shares fix patterns (documented in CLAUDE.md)
✓ New team members benefit from accumulated fixes
✓ LLM reliability improves over time without model changes

Conclusion

The Five-Point Error Diagnostic Framework transforms LLM error handling from reactive symptom-fixing to proactive root cause elimination.

Core Principle: Every LLM error fits into one of five categories:

Context: Missing information
Model: Insufficient capability
Rules: Unclear guidelines
Testing: Weak verification
Quality Gates: Insufficient automation

Systematic Process:

Diagnose which of the five root causes
Apply corresponding systematic fix
Verify error is eliminated
Document fix for future reference

Result: Each fix permanently eliminates that error class, leading to exponential improvement in LLM reliability.

Key Insight: Stop treating LLM errors as random incidents. They’re systematic problems with systematic solutions.

Related Concepts

Hierarchical Context Patterns – Provide localized context to prevent Context Problems
Verification Sandwich Pattern – Systematic testing workflow to catch errors early
Claude Code Hooks Quality Gates – Automate quality gates to prevent Quality Gate Problems
Integration Testing Patterns – Strong tests that catch real behavior issues
Quality Gates as Information Filters – How prevention strategies reduce entropy
Error Messages as Training Data – Document diagnosed errors in ERRORS.md for persistent memory
Context Debugging Framework – Systematic debugging hierarchy complements root cause analysis
Prevention Protocol – Turn diagnosed bugs into systematic prevention measures
Test-Based Regression Patching – Write failing tests before fixing diagnosed bugs
Clean Slate Trajectory Recovery – Escape bad LLM trajectories when diagnosis reveals context rot

References

Root Cause Analysis (Wikipedia) – The systematic process of identifying root causes of problems
Five Whys Technique – Iterative interrogative technique used to explore cause-and-effect relationships

Five-Point Error Diagnostic Framework: Systematic LLM Error Reduction

Summary

The Problem

The Solution

The Problem: Random, Recurring Errors

Why This Happens

The Solution: Five Root Causes

The Framework

Step 1: Diagnose Root Cause

Step 2: Apply Corresponding Fix

Step 3: Verify Fix

Step 4: Systematically Remove Error Class

Root Cause 1: Context Problems

Symptoms

Example

Diagnosis

Solutions

Result

Root Cause 2: Model Problems

Symptoms

Example

Diagnosis

Solutions

Result

Root Cause 3: Rules Problems

Symptoms

Example

Diagnosis

Solutions

Result

Root Cause 4: Testing Problems

Symptoms

Example

Diagnosis

Solutions

Result

Root Cause 5: Quality Gate Problems

Symptoms

Example

Diagnosis

Solutions

Result

Real-World Workflow

Scenario: Building a payment processing feature

Final State

Learn Prompt Engineering

Best Practices

1. Diagnose Before Fixing

2. Fix Root Cause, Not Symptom

3. Document Every Fix

4. Apply Multiple Fixes When Needed

5. Measure Improvement

When NOT to Use

❌ One-off scripts

❌ Exploration/prototyping

❌ Simple, isolated bugs

✓ Always use for production code

Common Pitfalls

Pitfall 1: Fixing Symptoms Instead of Root Causes

Pitfall 2: Assuming Single Root Cause

Pitfall 3: Not Verifying Fixes

Pitfall 4: Over-relying on One Fix Type

Measuring Success

Key Metrics

Success Indicators

Conclusion

Related Concepts

References

More Insights

LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

Agent Search Observation Loop: Learning What Context to Provide