Summary
Write a failing test that reproduces the bug before asking an LLM to fix it. This reduces fix iterations by 50%+ by giving the LLM a concrete verification target and preventing regressions. The test becomes a quality gate that definitively proves when the bug is fixed.
The Problem
When debugging with LLMs, developers often iterate multiple times on broken solutions because the LLM doesn’t have a clear, verifiable target. Without a test, you rely on manual verification, which is slow and error-prone. The LLM may think it fixed the bug when it only addressed symptoms, leading to 5-10+ iterations.
The Solution
Before asking the LLM to fix a bug, write a failing test that reproduces it. This test becomes the success criteria – when it passes, the bug is definitively fixed. The LLM can iterate autonomously until the test passes, reducing human verification overhead from 5-10 manual checks to a single test run.
The Problem
When debugging with AI coding agents, you often encounter this frustrating cycle:
1. User: "Fix the login bug - users can't sign in"
2. LLM: "I've fixed it by updating the auth handler"
3. User tests: Still broken
4. User: "Still not working, I see error X"
5. LLM: "Try this updated version"
6. User tests: Different error
7. User: "Now I get error Y"
8. LLM: "Here's another fix"
9. User tests: Back to original error
10. (Repeat 3-7 more times...)
This iterative debugging loop wastes time and creates frustration because:
1. No Clear Success Criteria
The LLM doesn’t have a programmatic way to verify its fix worked. It relies on your manual testing and feedback, which is:
- Slow: Each iteration requires manual testing
- Ambiguous: “Still broken” doesn’t tell the LLM what’s wrong
- Error-prone: You might miss edge cases or regression bugs
- Exhausting: 5-10 iterations burn developer time and energy
2. Symptom vs. Root Cause
Without a test, the LLM often fixes symptoms rather than root causes:
// Bug: User authentication fails silently
// ❌ Symptom fix (doesn't solve root cause)
if (!user) {
console.log('No user found'); // Added logging
return null;
}
// ✅ Root cause fix (test would catch this)
if (!user) {
throw new AuthenticationError('User not found', { userId });
}
The symptom fix might appear to work in basic testing, but the underlying issue (silent failures) remains.
3. Regression Risk
When fixing bugs without tests, you risk introducing new bugs:
// Original bug: Race condition in async handler
async function handleLogin(email: string) {
const user = await findUser(email);
const session = await createSession(user.id);
return session;
}
// LLM's fix (solves race condition but breaks error handling)
async function handleLogin(email: string) {
const user = await findUser(email);
if (!user) return null; // ❌ Silent failure introduced
const session = await createSession(user.id);
return session;
}
Without tests for existing behavior, you don’t know if the fix broke something else.
The Cost
For a typical bug fix:
Manual debugging: 5-10 iterations × 2-3 minutes = 10-30 minutes
With frustration overhead: 15-45 minutes per bug
For a team fixing 10 bugs/week:
10 bugs × 20 min = 200 minutes/week = 3.3 hours/week wasted
The Solution
Write a failing test that reproduces the bug BEFORE asking the LLM to fix it.
This simple practice transforms debugging from an ambiguous iterative process into a concrete verification loop:
1. Write failing test that reproduces the bug
2. Ask LLM to fix the code
3. Run test
4. If test passes → Bug fixed ✅
If test fails → LLM iterates (with clear error message)
Why This Works
1. Concrete Success Criteria
The test definitively proves when the bug is fixed:
// Before: Ambiguous verification
"The login still doesn't work" ❌
// After: Clear verification
test('should authenticate user with valid credentials', () => {
const result = authenticateUser('[email protected]', 'password123');
expect(result).toEqual({
success: true,
user: { email: '[email protected]' },
sessionToken: expect.any(String),
});
}); ✅
2. Forces Root Cause Thinking
Writing a test forces you to understand the bug’s root cause:
// Vague problem: "Login is broken"
// What does "broken" mean? Silent failure? Wrong error? Timeout?
// Test forces specificity:
test('should throw AuthenticationError when user not found', () => {
expect(() => authenticateUser('[email protected]', 'pass'))
.toThrow(AuthenticationError);
});
test('should not throw for network timeouts', async () => {
mockApiTimeout();
await expect(authenticateUser('[email protected]', 'pass'))
.rejects.toThrow(NetworkError);
});
3. Prevents Regressions
The test remains in your suite, preventing the bug from returning:
// This test runs on every commit
test('regression: auth should handle race condition', async () => {
// Simulate rapid concurrent requests
const promises = Array.from({ length: 10 }, () =>
authenticateUser('[email protected]', 'password123')
);
const results = await Promise.all(promises);
// All should succeed without race condition errors
results.forEach(result => {
expect(result.success).toBe(true);
});
});
Future code changes that reintroduce the bug will be caught immediately.
4. Enables Autonomous LLM Iteration
With a test, the LLM can iterate without human verification:
1. LLM attempts fix #1
2. Runs test → Fails with "TypeError: Cannot read property 'id' of null"
3. LLM analyzes error → Attempts fix #2
4. Runs test → Fails with "AuthenticationError: User not found"
5. LLM analyzes error → Attempts fix #3
6. Runs test → Passes ✅
7. Done
You’re only notified when the test passes, saving you 3-6 manual verification cycles.
Implementation
Step 1: Reproduce the Bug
Before writing a test, understand exactly how to reproduce the bug:
**Reproduction Steps**:
1. Navigate to /login
2. Enter email: [email protected]
3. Enter password: password123
4. Click "Sign In"
5. Observe: Error "Cannot read property 'id' of undefined"
**Expected**: User is authenticated and redirected to dashboard
**Actual**: Error thrown, login fails
Step 2: Write a Failing Test
Translate the reproduction steps into a test:
import { describe, test, expect, beforeEach } from 'vitest';
import { authenticateUser } from './auth';
import { mockDatabase } from './test-utils';
describe('Login Bug - Cannot read property id of undefined', () => {
beforeEach(() => {
// Setup: Create test user in database
mockDatabase.users.create({
email: '[email protected]',
passwordHash: hashPassword('password123'),
});
});
test('should authenticate user without throwing', async () => {
// This currently fails with "Cannot read property 'id' of undefined"
const result = await authenticateUser(
'[email protected]',
'password123'
);
expect(result).toBeDefined();
expect(result.success).toBe(true);
expect(result.user.email).toBe('[email protected]');
});
});
Verify the test fails:
$ npm test auth.test.ts
❌ FAIL Login Bug - Cannot read property id of undefined
✕ should authenticate user without throwing (24 ms)
TypeError: Cannot read property 'id' of undefined
at authenticateUser (auth.ts:42:18)
✅ Good – The test reproduces the bug.
Step 3: Provide Test to LLM
Now give the LLM both the failing test and the buggy code:
I have a bug in the authentication system. Here's a failing test that reproduces it:
[Paste test code]
The test fails with: "TypeError: Cannot read property 'id' of undefined"
Here's the buggy code:
[Paste auth.ts]
Please fix the code so the test passes. Run the test after your changes to verify.
Step 4: LLM Iterates Until Test Passes
The LLM now has a clear target:
// LLM's fix attempt
async function authenticateUser(email: string, password: string) {
const user = await db.users.findByEmail(email);
// Fix: Add null check before accessing user.id
if (!user) {
throw new AuthenticationError('User not found');
}
const isValid = await verifyPassword(password, user.passwordHash);
if (!isValid) {
throw new AuthenticationError('Invalid password');
}
const session = await createSession(user.id); // ✅ Now safe
return {
success: true,
user: { email: user.email },
sessionToken: session.token,
};
}
Run test:
$ npm test auth.test.ts
✅ PASS Login Bug - Cannot read property id of undefined
✓ should authenticate user without throwing (18 ms)
Test Suites: 1 passed, 1 total
Tests: 1 passed, 1 total
✅ Bug fixed – Test passes.
Step 5: Add Edge Case Tests
Once the basic fix works, add tests for edge cases:
describe('Authentication edge cases', () => {
test('should handle non-existent user gracefully', async () => {
await expect(
authenticateUser('[email protected]', 'password')
).rejects.toThrow(AuthenticationError);
});
test('should handle incorrect password', async () => {
await expect(
authenticateUser('[email protected]', 'wrongpassword')
).rejects.toThrow(AuthenticationError);
});
test('should handle database connection errors', async () => {
mockDatabase.simulateConnectionError();
await expect(
authenticateUser('[email protected]', 'password123')
).rejects.toThrow(DatabaseError);
});
});
Ask the LLM to ensure all tests pass.
Real-World Example
Bug Report
Bug: API returns 500 error when user has no subscriptions
Reproduction:
1. Create user without subscriptions
2. GET /api/user/subscriptions
3. Observe: 500 Internal Server Error
Expected: 200 OK with empty array []
Step 1: Write Failing Test
import { describe, test, expect } from 'vitest';
import { getUserSubscriptions } from './subscriptions';
import { createMockUser } from './test-utils';
describe('Subscription Bug - 500 error for users without subscriptions', () => {
test('should return empty array for user with no subscriptions', async () => {
const user = await createMockUser({ subscriptions: [] });
const result = await getUserSubscriptions(user.id);
expect(result).toEqual({
subscriptions: [],
total: 0,
});
});
});
Run test:
❌ FAIL
TypeError: Cannot read property 'map' of undefined
at getUserSubscriptions (subscriptions.ts:12:25)
Step 2: Prompt LLM
Bug: getUserSubscriptions() crashes when user has no subscriptions.
Failing test:
[Paste test code]
Buggy code:
[Paste subscriptions.ts]
Fix the code so the test passes.
Step 3: LLM Fixes
// Before (buggy):
async function getUserSubscriptions(userId: string) {
const user = await db.users.findById(userId);
return {
subscriptions: user.subscriptions.map(formatSubscription), // ❌ Crashes if undefined
total: user.subscriptions.length,
};
}
// After (fixed):
async function getUserSubscriptions(userId: string) {
const user = await db.users.findById(userId);
if (!user) {
throw new NotFoundError('User not found');
}
const subscriptions = user.subscriptions || []; // ✅ Handle undefined/null
return {
subscriptions: subscriptions.map(formatSubscription),
total: subscriptions.length,
};
}
Run test:
✅ PASS
✓ should return empty array for user with no subscriptions (15 ms)
Step 4: Expand Test Coverage
describe('Subscription retrieval', () => {
test('should return user subscriptions', async () => {
const user = await createMockUser({
subscriptions: [
{ plan: 'pro', status: 'active' },
{ plan: 'enterprise', status: 'active' },
],
});
const result = await getUserSubscriptions(user.id);
expect(result.total).toBe(2);
expect(result.subscriptions).toHaveLength(2);
});
test('should handle non-existent user', async () => {
await expect(
getUserSubscriptions('nonexistent-id')
).rejects.toThrow(NotFoundError);
});
});
Best Practices
1. Test at the Right Level
Choose the appropriate test level for the bug:
// ❌ Too low-level (tests implementation details)
test('should call findByEmail with correct email', () => {
authenticateUser('[email protected]', 'pass');
expect(mockDb.findByEmail).toHaveBeenCalledWith('[email protected]');
});
// ✅ Right level (tests behavior)
test('should authenticate valid user', async () => {
const result = await authenticateUser('[email protected]', 'pass');
expect(result.success).toBe(true);
});
// ✅ Also good (integration test for complex bugs)
test('should authenticate user end-to-end', async () => {
const response = await request(app)
.post('/api/auth/login')
.send({ email: '[email protected]', password: 'pass' });
expect(response.status).toBe(200);
expect(response.body.sessionToken).toBeDefined();
});
Rule of thumb: Test at the lowest level that reliably reproduces the bug.
2. Make Tests Deterministic
Avoid flaky tests that sometimes pass/fail:
// ❌ Flaky (depends on timing)
test('should complete within 100ms', async () => {
const start = Date.now();
await processData();
expect(Date.now() - start).toBeLessThan(100);
});
// ✅ Deterministic (tests behavior, not timing)
test('should process all items', async () => {
const result = await processData([1, 2, 3]);
expect(result).toEqual([2, 4, 6]);
});
3. Include Error Messages in Tests
Error messages help the LLM understand what’s expected:
// ❌ Generic assertion
expect(result).toBe(true);
// ✅ Descriptive assertion
expect(result.success).toBe(true);
expect(result.user).toBeDefined();
expect(result.user.email).toBe('[email protected]');
4. Test the Fix, Not Just the Bug
Don’t just verify the error is gone – verify the correct behavior:
// ❌ Only tests that error is fixed
test('should not throw error', async () => {
await expect(authenticateUser('[email protected]', 'pass'))
.resolves.not.toThrow();
});
// ✅ Tests correct behavior
test('should return authenticated user', async () => {
const result = await authenticateUser('[email protected]', 'pass');
expect(result).toMatchObject({
success: true,
user: { email: '[email protected]' },
sessionToken: expect.any(String),
});
});
5. Keep Tests Focused
One test should verify one behavior:
// ❌ Tests too many things
test('authentication works', async () => {
const result1 = await authenticateUser('[email protected]', 'pass');
expect(result1.success).toBe(true);
const result2 = await authenticateUser('[email protected]', 'pass');
expect(result2.success).toBe(false);
const result3 = await authenticateUser('[email protected]', 'wrong');
expect(result3.success).toBe(false);
});
// ✅ Focused tests
test('should authenticate valid user', async () => {
const result = await authenticateUser('[email protected]', 'pass');
expect(result.success).toBe(true);
});
test('should reject non-existent user', async () => {
await expect(authenticateUser('[email protected]', 'pass'))
.rejects.toThrow(AuthenticationError);
});
test('should reject incorrect password', async () => {
await expect(authenticateUser('[email protected]', 'wrong'))
.rejects.toThrow(AuthenticationError);
});
Common Pitfalls
❌ Pitfall 1: Writing Tests After the Fix
Problem: Writing tests after fixing the bug doesn’t help reduce iterations.
// Wrong order:
1. Report bug to LLM
2. LLM attempts fix
3. Manually verify (3-5 iterations)
4. Bug fixed
5. Write test
// Correct order:
1. Write failing test
2. Report bug + test to LLM
3. LLM fixes until test passes (autonomous)
4. Bug fixed
❌ Pitfall 2: Vague Test Assertions
Problem: Tests that don’t clearly specify expected behavior:
// ❌ Too vague
test('login should work', async () => {
const result = await login('[email protected]', 'pass');
expect(result).toBeTruthy();
});
// ✅ Specific expectations
test('login should return user and session token', async () => {
const result = await login('[email protected]', 'pass');
expect(result.user).toMatchObject({
email: '[email protected]',
id: expect.any(String),
});
expect(result.sessionToken).toMatch(/^[a-zA-Z0-9-_]+$/);
expect(result.expiresAt).toBeInstanceOf(Date);
});
❌ Pitfall 3: Testing Implementation Instead of Behavior
Problem: Tests break when refactoring, even if behavior is correct:
// ❌ Tests implementation
test('should call bcrypt.compare', async () => {
await authenticateUser('[email protected]', 'pass');
expect(bcrypt.compare).toHaveBeenCalled();
});
// ✅ Tests behavior
test('should authenticate user with valid password', async () => {
const result = await authenticateUser('[email protected]', 'validPass');
expect(result.success).toBe(true);
});
test('should reject user with invalid password', async () => {
await expect(authenticateUser('[email protected]', 'wrongPass'))
.rejects.toThrow(AuthenticationError);
});
❌ Pitfall 4: Skipping Edge Cases
Problem: Only testing the happy path leaves edge cases unverified:
// ❌ Only happy path
test('should authenticate user', async () => {
const result = await authenticateUser('[email protected]', 'pass');
expect(result.success).toBe(true);
});
// ✅ Covers edge cases
describe('Authentication', () => {
test('should authenticate valid user', async () => {
const result = await authenticateUser('[email protected]', 'pass');
expect(result.success).toBe(true);
});
test('should handle empty email', async () => {
await expect(authenticateUser('', 'pass'))
.rejects.toThrow(ValidationError);
});
test('should handle empty password', async () => {
await expect(authenticateUser('[email protected]', ''))
.rejects.toThrow(ValidationError);
});
test('should handle malformed email', async () => {
await expect(authenticateUser('not-an-email', 'pass'))
.rejects.toThrow(ValidationError);
});
});
Integration with Other Patterns
Combine with Actor-Critic Pattern
Use tests as the “critic” in an actor-critic loop:
Prompt:
"Fix this bug using actor-critic approach:
1. ACTOR: Generate a fix
2. CRITIC: Run the test and analyze failures
3. ACTOR: Refine based on test output
4. Repeat until test passes
Here's the failing test:
[test code]
Here's the buggy code:
[buggy code]"
Combine with Quality Gates
Tests become automatic quality gates in CI/CD:
# .github/workflows/ci.yml
name: CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
- run: npm install
- run: npm test # ← Regression tests run automatically
Every commit must pass regression tests, preventing bugs from returning.
Combine with Institutional Memory
Add regression tests to your learning files:
# ERRORS.md
## Bug: Authentication Race Condition (Fixed 2025-11-02)
**Symptom**: Login occasionally failed with "Cannot read property 'id' of undefined"
**Root Cause**: Race condition when concurrent requests accessed user object
**Regression Test**:
```typescript
test('regression: auth handles concurrent requests', async () => {
const promises = Array.from({ length: 10 }, () =>
authenticateUser('[email protected]', 'pass')
);
const results = await Promise.all(promises);
results.forEach(r => expect(r.success).toBe(true));
});
Prevention: Always write regression test when fixing race conditions
## Measuring Success
### Key Metrics
**1. Iterations to Fix**
Track how many LLM iterations are needed:
Without test-first:
- Average iterations: 5-8
- Range: 3-12
With test-first:
- Average iterations: 2-3
- Range: 1-5
Improvement: 50-60% reduction
**2. Time to Fix**
Measure total time from bug report to verified fix:
Without test-first:
- Average: 20-30 minutes
- Includes: Multiple manual verifications, back-and-forth with LLM
With test-first:
- Average: 8-12 minutes
- Includes: Writing test (3-5 min) + LLM fix (5-7 min)
Improvement: 50-60% time savings
**3. Regression Rate**
Track how often bugs return:
Without regression tests:
- 15-20% of bugs return within 6 months
With regression tests:
- 0-2% of bugs return (only if test was inadequate)
Improvement: 90%+ reduction in regressions
## Conclusion
Test-based regression patching transforms debugging from an ambiguous, iterative process into a concrete, verifiable workflow.
**Key Principles**:
1. **Write the test first** - Before asking the LLM to fix anything
2. **Make it fail** - Verify the test reproduces the bug
3. **Give test to LLM** - Let it iterate until the test passes
4. **Expand coverage** - Add edge case tests once the basic fix works
5. **Keep the tests** - They prevent regressions forever
**The Impact**:
- **50%+ faster bug fixes**: From 20-30 minutes to 8-12 minutes
- **Autonomous iteration**: LLM fixes bugs without human verification
- **Zero regressions**: Tests prevent bugs from returning
- **Better code quality**: Tests force understanding of root causes
**When to Use**:
- ✅ Any reproducible bug
- ✅ Regressions (especially important to prevent recurrence)
- ✅ Edge cases discovered in production
- ✅ Bugs reported by users (write test from reproduction steps)
- ✅ Intermittent bugs (test helps isolate the condition)
**When NOT to Use**:
- ❌ Bugs you can't reproduce reliably
- ❌ Issues with external services (mocking required)
- ❌ UI bugs (use visual regression testing instead)
- ❌ Performance issues (use benchmarks instead)
By making test-based regression patching your default debugging workflow, you'll fix bugs faster, prevent regressions, and build a more robust codebase - all while letting the LLM do more of the heavy lifting.
## Related Concepts
- [Actor-Critic Adversarial Coding](./actor-critic-adversarial-coding.md) - Use tests as the critic in a generation loop
- [Quality Gates as Information Filters](./quality-gates-as-information-filters.md) - Tests filter out invalid solutions
- [Verification Sandwich Pattern](./verification-sandwich-pattern.md) - Establish clean baseline before and after changes
- [Compounding Effects of Quality Gates](./compounding-effects-quality-gates.md) - How stacked gates multiply quality improvements
- [Claude Code Hooks as Quality Gates](./claude-code-hooks-quality-gates.md) - Automate test execution on every code change
- [Test-Driven Prompting](./test-driven-prompting.md) - Write tests before generating code to constrain LLM output
- [Integration Testing Patterns](./integration-testing-patterns.md) - Integration tests catch more regression bugs
- [Test Custom Infrastructure](./test-custom-infrastructure.md) - Test your testing infrastructure to avoid cascading failures
- [Property-Based Testing for LLM-Generated Code](./property-based-testing.md) - Catch edge cases automatically with invariants
- [Automated Flaky Test Detection](./flaky-test-diagnosis-script.md) - Diagnose intermittent test failures systematically
- [Early Linting Prevents Ratcheting](./early-linting-prevents-ratcheting.md) - Catch issues early before they compound
- [Trust But Verify Protocol](./trust-but-verify-protocol.md) - Tests provide the verification layer
- [Institutional Memory Learning Files](./institutional-memory-learning-files.md) - Document regression tests in ERRORS.md
- [Error Messages as Training Data](./error-messages-as-training.md) - Track recurring errors and link to regression tests
- [Five-Point Error Diagnostic Framework](./five-point-error-diagnostic-framework.md) - Diagnose root cause before writing regression test
- [Context Debugging Framework](./context-debugging-framework.md) - Tests provide concrete verification targets for Layer 4 debugging
- [Prevention Protocol](./prevention-protocol.md) - Tests are a key prevention measure after bug fixes
- [Clean Slate Trajectory Recovery](./clean-slate-trajectory-recovery.md) - Use failing tests as constraints in fresh sessions
## References
- [Vitest Testing Framework](https://vitest.dev/) - Fast unit test framework for TypeScript/JavaScript
- [Jest Documentation](https://jestjs.io/docs/getting-started) - Popular testing framework with extensive mocking capabilities

