Summary
LLM errors often seem random and unpredictable, making it difficult to diagnose and prevent recurring issues. This framework provides a systematic approach by categorizing every LLM problem into one of five root causes: Context, Model, Rules, Testing, or Quality Gates. By addressing the root cause rather than symptoms, you permanently eliminate entire classes of errors.
The Problem
When LLMs generate incorrect code, developers waste time treating symptoms instead of diagnosing root causes. Errors appear random and unpredictable. The same mistakes recur despite fixes. Without a systematic diagnostic framework, teams struggle to improve LLM reliability over time.
The Solution
Every LLM error fits into one of five categories: Context Problems (missing information), Model Problems (insufficient capability), Rules Problems (unclear guidelines), Testing Problems (weak verification), or Quality Gate Problems (insufficient automation). By systematically diagnosing which category each error belongs to and applying the corresponding fix, you permanently reduce that error class in your system.
The Problem: Random, Recurring Errors
You’re working with an LLM to build a feature. It generates code that:
- Doesn’t follow your project’s patterns
- Uses non-existent functions
- Passes tests but breaks in production
- Violates documented conventions
- Repeats the same mistakes you just fixed
You fix each issue as it appears, but errors keep recurring. It feels random and unpredictable.
Why This Happens
Most developers treat LLM errors as individual, isolated incidents:
Error occurs → Fix the specific issue → Move on
This approach is reactive and symptom-focused. It doesn’t address why the error happened.
Result: The same error types keep appearing because the root cause was never fixed.
The Solution: Five Root Causes
Every LLM error falls into one of five categories:
- Context Problem: LLM lacks information to make correct decisions
- Model Problem: Current model lacks capability for task complexity
- Rules Problem: CLAUDE.md/documentation doesn’t specify behavior
- Testing Problem: Tests don’t catch the error type
- Quality Gate Problem: No automated check enforces the requirement
Key Insight: By diagnosing which category each error belongs to, you can apply systematic fixes that eliminate entire error classes permanently.
The Framework
Step 1: Diagnose Root Cause
When the LLM produces incorrect output, ask:
“Which of the five root causes is this?”
┌─────────────────────────────────────────────────────────┐
│ DIAGNOSTIC QUESTIONS │
├─────────────────────────────────────────────────────────┤
│ 1. Context: Did LLM have relevant examples/patterns? │
│ 2. Model: Is task too complex for current model? │
│ 3. Rules: Does CLAUDE.md specify this behavior? │
│ 4. Testing: Would better tests catch this? │
│ 5. Quality Gates: Could automation prevent this? │
└─────────────────────────────────────────────────────────┘
Step 2: Apply Corresponding Fix
Each root cause has specific solutions:
type RootCause =
| "context" // → Inject relevant examples
| "model" // → Use more powerful model
| "rules" // → Update CLAUDE.md
| "testing" // → Add/improve tests
| "quality-gate" // → Add automated checks
Step 3: Verify Fix
Re-run the original task and confirm the error is eliminated:
# Before fix
$ claude "Implement user authentication"
# → Generates code without password hashing
# After fix (added rule to CLAUDE.md)
$ claude "Implement user authentication"
# → Generates code WITH password hashing ✓
Step 4: Systematically Remove Error Class
Critical insight: Each fix doesn’t just solve one instance—it eliminates that entire error class from your system.
Error occurs → Diagnose root cause → Apply systematic fix → Error class eliminated forever
Root Cause 1: Context Problems
Symptoms
- LLM generates code that doesn’t match existing patterns
- LLM asks questions about basic project structure
- LLM references non-existent files or functions
- LLM produces generic code without project-specific knowledge
Example
$ claude "Add pagination to the users API"
# LLM generates:
// ❌ Generic pagination (doesn't match project patterns)
app.get('/users', (req, res) => {
const page = parseInt(req.query.page) || 1;
const limit = parseInt(req.query.limit) || 10;
// ...
});
But your project uses a standardized pagination helper:
// ✓ Project pattern
import { paginate } from './utils/pagination';
app.get('/users', async (req, res) => {
const users = await paginate(User, req.query);
res.json(users);
});
Diagnosis
This is a Context Problem. The LLM didn’t know about your pagination helper because it wasn’t in the context.
Solutions
-
Inject relevant examples:
$ claude "Add pagination to the users API. Use the same pattern as products API." # Include src/api/products.ts in context -
Add hierarchical CLAUDE.md files:
<!-- src/api/CLAUDE.md --> # API Patterns ## Pagination Always use `paginate()` helper from `utils/pagination.ts`: ```typescript import { paginate } from '../utils/pagination'; app.get('/endpoint', async (req, res) => { const results = await paginate(Model, req.query); res.json(results); }); -
Reference existing implementations:
$ claude "Add pagination following the pattern in src/api/products.ts"
Result
Permanent fix: Once pagination patterns are documented in src/api/CLAUDE.md, the LLM will always use the correct pattern for API pagination.
Root Cause 2: Model Problems
Symptoms
- LLM fails on complex architecture decisions
- LLM struggles with multi-step reasoning
- LLM generates incomplete solutions
- LLM makes the same mistakes repeatedly despite context
Example
$ claude "Refactor the auth system to support OAuth, SAML, and JWT simultaneously"
# LLM generates incomplete solution:
# - Only implements OAuth
# - Doesn't handle provider switching
# - Missing integration with existing session management
Diagnosis
This is a Model Problem. The task requires complex architectural reasoning that the current model struggles with.
Solutions
-
Switch to more powerful model:
# If using Claude Sonnet $ claude --model opus "Refactor auth system..." # If using GPT-3.5 $ claude --model gpt-4 "Refactor auth system..." -
Break complex task into smaller steps:
# Instead of one large task: $ claude "Step 1: Design auth provider interface" $ claude "Step 2: Implement OAuth provider" $ claude "Step 3: Implement SAML provider" $ claude "Step 4: Add provider switching logic" -
Use chain-of-thought prompting:
$ claude "Before implementing, first: 1. Analyze existing auth system 2. Design provider interface 3. Plan migration strategy Then implement the refactoring"
Result
Permanent fix: Document in your project’s CLAUDE.md which model to use for complex tasks:
<!-- CLAUDE.md -->
# Model Selection Guidelines
- **Simple tasks** (< 50 lines): Claude Sonnet
- **Complex refactoring** (> 100 lines): Claude Opus
- **Architecture design**: Claude Opus with chain-of-thought
Root Cause 3: Rules Problems
Symptoms
- LLM violates documented patterns (despite having context)
- LLM doesn’t handle edge cases correctly
- LLM repeats the same pattern violations
- LLM generates code that breaks project conventions
Example
$ claude "Add endpoint to delete user accounts"
# LLM generates:
// ❌ Hard delete (violates project rules)
app.delete('/users/:id', async (req, res) => {
await User.destroy({ where: { id: req.params.id } });
res.json({ success: true });
});
But your project has a strict “soft delete only” policy (for audit compliance).
Diagnosis
This is a Rules Problem. The LLM had context but didn’t know about the soft delete requirement.
Solutions
-
Add specific rules to CLAUDE.md:
<!-- CLAUDE.md --> # Data Deletion Rules **NEVER use hard deletes.** Always use soft deletes: ```typescript // ❌ NEVER do this await Model.destroy({ where: { id } }); // ✓ ALWAYS do this await Model.update({ deletedAt: new Date() }, { where: { id } });Rationale: Audit compliance requires data retention.
-
Document edge cases with examples:
## Edge Cases ### Deleting User with Active Subscriptions Before soft-deleting user: 1. Cancel all active subscriptions 2. Send cancellation emails 3. Log deletion reason 4. Then soft-delete user record -
Add “DO NOT” anti-patterns:
## Anti-Patterns ❌ **DO NOT** use `.destroy()` ❌ **DO NOT** use `DELETE FROM` in raw SQL ❌ **DO NOT** permanently remove user data ✓ **DO** use `.update({ deletedAt })` ✓ **DO** check for related records first ✓ **DO** log deletion with reason
Result
Permanent fix: Once soft delete rules are in CLAUDE.md, the LLM will never generate hard deletes again.
Root Cause 4: Testing Problems
Symptoms
- Code passes tests but doesn’t work correctly in production
- Tests focus on happy path only
- Tests check for presence, not behavior
- LLM generates code that satisfies weak tests
Example
$ claude "Add input validation to user registration"
# LLM generates code that passes this weak test:
// ❌ Weak test (checks presence, not behavior)
test('validates email', () => {
const result = validateEmail('invalid');
expect(result).toBeDefined(); // Just checks it returns something
});
The code “validates” by always returning true:
// ❌ Passes weak test but doesn't actually validate
function validateEmail(email: string): boolean {
return true; // Always passes!
}
Diagnosis
This is a Testing Problem. The test is too weak to catch invalid implementations.
Solutions
-
Test behavior, not presence:
// ✓ Strong test (checks actual behavior) describe('validateEmail', () => { test('accepts valid emails', () => { expect(validateEmail('[email protected]')).toBe(true); expect(validateEmail('[email protected]')).toBe(true); }); test('rejects invalid emails', () => { expect(validateEmail('invalid')).toBe(false); expect(validateEmail('@example.com')).toBe(false); expect(validateEmail('user@')).toBe(false); expect(validateEmail('')).toBe(false); }); }); -
Test edge cases and error scenarios:
test('handles special characters', () => { expect(validateEmail('[email protected]')).toBe(true); }); test('handles very long emails', () => { const longEmail = 'a'.repeat(64) + '@example.com'; expect(validateEmail(longEmail)).toBe(true); }); -
Focus on integration, not units:
// Instead of testing validateEmail() in isolation, // test the whole registration flow: test('registration rejects invalid email', async () => { const response = await request(app) .post('/register') .send({ email: 'invalid', password: 'secure123' }); expect(response.status).toBe(400); expect(response.body.error).toContain('Invalid email'); });
Result
Permanent fix: Strong tests force the LLM to generate correct implementations. Weak tests allow broken code to pass.
Root Cause 5: Quality Gate Problems
Symptoms
- Code compiles but violates architectural patterns
- Linting passes but conventions are broken
- Code works but doesn’t follow project standards
- Subtle bugs slip through (null checks, type errors, etc.)
Example
$ claude "Add user profile update endpoint"
# LLM generates:
// ❌ Compiles and lints, but violates architecture
app.put('/users/:id', async (req, res) => {
// Direct database access in route handler (bad!)
const user = await db.query('UPDATE users SET ... WHERE id = ?', [req.params.id]);
res.json(user);
});
Your architecture requires service layer separation, but there’s no automated check for this.
Diagnosis
This is a Quality Gate Problem. The code passes TypeScript and ESLint, but violates architectural rules that aren’t automated.
Solutions
-
Add custom ESLint rules:
// .eslintrc.js module.exports = { rules: { 'no-restricted-imports': ['error', { patterns: [{ group: ['**/db', '**/database'], message: 'Import services, not db directly in routes' }] }] } }; -
Use AST-grep for pattern enforcement:
# .ast-grep/rules/no-db-in-routes.yml id: no-db-in-routes language: typescript rule: pattern: | app.$METHOD($PATH, async ($REQ, $RES) => { $$$ await db.$CALL($$$) $$$ }) message: "Don't access db directly in route handlers. Use services." -
Add pre-commit hooks:
# .husky/pre-commit #!/bin/bash npm run type-check npm run lint npm run test ast-grep scan # Architectural pattern checks -
Enable stricter TypeScript:
// tsconfig.json { "compilerOptions": { "strict": true, "noImplicitAny": true, "strictNullChecks": true, "noUncheckedIndexedAccess": true } }
Result
Permanent fix: Once automated checks exist, the LLM can’t generate code that violates patterns (hooks will block it).
Real-World Workflow
Scenario: Building a payment processing feature
# Initial attempt
$ claude "Add Stripe payment processing"
# Error 1: LLM uses wrong API version
Diagnosis: Context Problem – LLM doesn’t know which Stripe API version you use
Fix: Add to CLAUDE.md:
# Payment Processing
Use Stripe API v2023-10-16:
```typescript
import Stripe from 'stripe';
const stripe = new Stripe(process.env.STRIPE_KEY, {
apiVersion: '2023-10-16'
});
**Result**: LLM now always uses correct API version ✓
---
```bash
# Second attempt
$ claude "Add Stripe payment processing"
# Error 2: LLM doesn't handle webhook signature verification
Diagnosis: Rules Problem – CLAUDE.md doesn’t specify webhook security
Fix: Add to CLAUDE.md:
## Webhook Security
**ALWAYS verify webhook signatures:**
```typescript
app.post('/webhooks/stripe', (req, res) => {
const sig = req.headers['stripe-signature'];
// ✓ REQUIRED: Verify signature
const event = stripe.webhooks.constructEvent(
req.body,
sig,
process.env.STRIPE_WEBHOOK_SECRET
);
// Process event...
});
**Result**: LLM now always verifies webhook signatures ✓
---
```bash
# Third attempt
$ claude "Add Stripe payment processing"
# Error 3: Code passes tests but fails in production (idempotency)
Diagnosis: Testing Problem – Tests don’t check idempotency
Fix: Add integration test:
test('payment processing is idempotent', async () => {
const idempotencyKey = 'test-key-123';
// First charge
const charge1 = await processPayment({
amount: 1000,
idempotencyKey
});
// Second charge with same key (should return same result)
const charge2 = await processPayment({
amount: 1000,
idempotencyKey
});
expect(charge1.id).toBe(charge2.id); // Same charge
});
Result: LLM now generates idempotent payment code ✓
# Fourth attempt
$ claude "Add Stripe payment processing"
# Error 4: LLM logs sensitive card data
Diagnosis: Quality Gate Problem – No automated check for PCI compliance
Fix: Add AST-grep rule:
id: no-log-card-data
language: typescript
rule:
pattern: |
logger.$METHOD($$$, $CARD, $$$)
where:
CARD:
regex: "card|payment|stripe"
message: "Never log card/payment data (PCI compliance)"
Result: Pre-commit hook blocks code that logs sensitive data ✓
Final State
After four iterations, you’ve systematically eliminated four error classes:
- ✓ Context: Stripe patterns documented
- ✓ Rules: Security requirements specified
- ✓ Testing: Idempotency verified
- ✓ Quality Gates: PCI compliance automated
Future payment features: LLM will never make these four mistake types again.
Best Practices
1. Diagnose Before Fixing
# ❌ React immediately
$ claude "Fix the bug in auth.ts"
# ✓ Diagnose first
# Ask: "Is this Context, Model, Rules, Testing, or Quality Gate?"
# Then apply systematic fix
2. Fix Root Cause, Not Symptom
# ❌ Symptom fix
$ claude "Change line 47 to use bcrypt.hash instead of plain text"
# ✓ Root cause fix
# Add to CLAUDE.md:
# "ALWAYS hash passwords with bcrypt before storing"
# Now ALL future password code will be correct
3. Document Every Fix
Maintain a log of diagnosed issues:
<!-- PROJECT_LOG.md -->
# Error Diagnosis Log
## 2025-11-01: Password Hashing
- **Error**: LLM stored passwords in plain text
- **Root Cause**: Rules Problem
- **Fix**: Added password security rules to CLAUDE.md
- **Result**: All future auth code includes hashing ✓
## 2025-10-28: Pagination Pattern
- **Error**: LLM used custom pagination instead of helper
- **Root Cause**: Context Problem
- **Fix**: Added pagination examples to src/api/CLAUDE.md
- **Result**: All future API endpoints use standard helper ✓
This log helps you:
- Track improvement over time
- Identify recurring patterns
- Share learnings with team
4. Apply Multiple Fixes When Needed
Some errors have multiple root causes:
Error: LLM generates SQL injection vulnerability
Root Causes:
1. Rules Problem: CLAUDE.md doesn't prohibit raw SQL
2. Testing Problem: No security tests
3. Quality Gate Problem: No SQL injection scanner
Fixes:
1. Add rule: "NEVER use raw SQL, always use ORM"
2. Add test: Check all inputs are parameterized
3. Add hook: Run SQL injection scanner pre-commit
Result: Triple-layer defense against SQL injection.
5. Measure Improvement
Track error reduction over time:
Week 1: 15 errors requiring fixes
Week 2: 12 errors (3 error classes eliminated)
Week 3: 7 errors (5 more classes eliminated)
Week 4: 3 errors (mostly edge cases)
Goal: Exponential error reduction as systematic fixes compound.
When NOT to Use
This framework is overkill for:
❌ One-off scripts
# Throwaway data migration script
$ claude "Write script to migrate users table"
# Just fix inline, don't systematize
❌ Exploration/prototyping
# Trying different approaches
$ claude "Show me 3 ways to implement caching"
# No need for systematic fixes during exploration
❌ Simple, isolated bugs
# Typo in variable name
$ claude "Fix typo: `usre` should be `user`"
# Simple one-off fix
✓ Always use for production code
Apply systematic diagnosis for:
- Feature development
- Bug fixes in core systems
- Refactoring existing code
- Adding new API endpoints
- Security-critical code
Common Pitfalls
Pitfall 1: Fixing Symptoms Instead of Root Causes
# ❌ Symptom-focused
$ claude "Change this line to add password hashing"
# Only fixes one instance
# ✓ Root cause-focused
# Add rule to CLAUDE.md about password security
# Fixes ALL future instances
Pitfall 2: Assuming Single Root Cause
Error: Authentication tokens expire too quickly
Don't assume: "This is just a Rules Problem"
Actually multiple causes:
1. Rules: Token lifetime not specified
2. Testing: No tests for token expiration
3. Context: Didn't know existing token patterns
Solution: Check all five categories systematically.
Pitfall 3: Not Verifying Fixes
# ❌ Apply fix without verification
# Add rule to CLAUDE.md
# Assume it's fixed
# ✓ Verify fix works
# Add rule to CLAUDE.md
$ claude "Implement user authentication" # Re-run original task
# Confirm password hashing is now included ✓
Pitfall 4: Over-relying on One Fix Type
Team only adds rules to CLAUDE.md
→ CLAUDE.md becomes huge and unwieldy
→ LLM can't fit all rules in context
Better approach:
- Rules: High-level patterns and anti-patterns
- Quality Gates: Automate what can be checked
- Testing: Verify behavior programmatically
- Context: Provide examples, not rules
Balance across all five categories.
Measuring Success
Key Metrics
-
Error recurrence rate
- % of errors that recur after “fixing”
- Target: < 5% (systematic fixes prevent recurrence)
-
Time to diagnose
- Minutes spent identifying root cause
- Target: < 5 minutes (framework makes diagnosis fast)
-
Error reduction trend
- Weekly count of LLM errors requiring fixes
- Target: Exponential decrease over time
-
Fix durability
- How long fixes remain effective
- Target: Permanent (error class eliminated)
Success Indicators
✓ Errors become predictable (fit into five categories)
✓ Same error type never recurs after systematic fix
✓ Team shares fix patterns (documented in CLAUDE.md)
✓ New team members benefit from accumulated fixes
✓ LLM reliability improves over time without model changes
Conclusion
The Five-Point Error Diagnostic Framework transforms LLM error handling from reactive symptom-fixing to proactive root cause elimination.
Core Principle: Every LLM error fits into one of five categories:
- Context: Missing information
- Model: Insufficient capability
- Rules: Unclear guidelines
- Testing: Weak verification
- Quality Gates: Insufficient automation
Systematic Process:
- Diagnose which of the five root causes
- Apply corresponding systematic fix
- Verify error is eliminated
- Document fix for future reference
Result: Each fix permanently eliminates that error class, leading to exponential improvement in LLM reliability.
Key Insight: Stop treating LLM errors as random incidents. They’re systematic problems with systematic solutions.
Related Concepts
- Hierarchical Context Patterns – Provide localized context to prevent Context Problems
- Verification Sandwich Pattern – Systematic testing workflow to catch errors early
- Claude Code Hooks Quality Gates – Automate quality gates to prevent Quality Gate Problems
- Integration Testing Patterns – Strong tests that catch real behavior issues
- Quality Gates as Information Filters – How prevention strategies reduce entropy
- Error Messages as Training Data – Document diagnosed errors in ERRORS.md for persistent memory
- Context Debugging Framework – Systematic debugging hierarchy complements root cause analysis
- Prevention Protocol – Turn diagnosed bugs into systematic prevention measures
- Test-Based Regression Patching – Write failing tests before fixing diagnosed bugs
- Clean Slate Trajectory Recovery – Escape bad LLM trajectories when diagnosis reveals context rot
References
- Root Cause Analysis (Wikipedia) – The systematic process of identifying root causes of problems
- Five Whys Technique – Iterative interrogative technique used to explore cause-and-effect relationships

