Test-Based Regression Patching: 50% Faster Bug Fixes

James Phoenix
James Phoenix

Summary

Write a failing test that reproduces the bug before asking an LLM to fix it. This reduces fix iterations by 50%+ by giving the LLM a concrete verification target and preventing regressions. The test becomes a quality gate that definitively proves when the bug is fixed.

The Problem

When debugging with LLMs, developers often iterate multiple times on broken solutions because the LLM doesn’t have a clear, verifiable target. Without a test, you rely on manual verification, which is slow and error-prone. The LLM may think it fixed the bug when it only addressed symptoms, leading to 5-10+ iterations.

The Solution

Before asking the LLM to fix a bug, write a failing test that reproduces it. This test becomes the success criteria – when it passes, the bug is definitively fixed. The LLM can iterate autonomously until the test passes, reducing human verification overhead from 5-10 manual checks to a single test run.

The Problem

When debugging with AI coding agents, you often encounter this frustrating cycle:

1. User: "Fix the login bug - users can't sign in"
2. LLM: "I've fixed it by updating the auth handler"
3. User tests: Still broken
4. User: "Still not working, I see error X"
5. LLM: "Try this updated version"
6. User tests: Different error
7. User: "Now I get error Y"
8. LLM: "Here's another fix"
9. User tests: Back to original error
10. (Repeat 3-7 more times...)

This iterative debugging loop wastes time and creates frustration because:

1. No Clear Success Criteria

The LLM doesn’t have a programmatic way to verify its fix worked. It relies on your manual testing and feedback, which is:

  • Slow: Each iteration requires manual testing
  • Ambiguous: “Still broken” doesn’t tell the LLM what’s wrong
  • Error-prone: You might miss edge cases or regression bugs
  • Exhausting: 5-10 iterations burn developer time and energy

2. Symptom vs. Root Cause

Without a test, the LLM often fixes symptoms rather than root causes:

// Bug: User authentication fails silently

// ❌ Symptom fix (doesn't solve root cause)
if (!user) {
  console.log('No user found'); // Added logging
  return null;
}

// ✅ Root cause fix (test would catch this)
if (!user) {
  throw new AuthenticationError('User not found', { userId });
}

The symptom fix might appear to work in basic testing, but the underlying issue (silent failures) remains.

3. Regression Risk

When fixing bugs without tests, you risk introducing new bugs:

// Original bug: Race condition in async handler
async function handleLogin(email: string) {
  const user = await findUser(email);
  const session = await createSession(user.id);
  return session;
}

// LLM's fix (solves race condition but breaks error handling)
async function handleLogin(email: string) {
  const user = await findUser(email);
  if (!user) return null; // ❌ Silent failure introduced
  const session = await createSession(user.id);
  return session;
}

Without tests for existing behavior, you don’t know if the fix broke something else.

The Cost

For a typical bug fix:

Manual debugging: 5-10 iterations × 2-3 minutes = 10-30 minutes
With frustration overhead: 15-45 minutes per bug

For a team fixing 10 bugs/week:
10 bugs × 20 min = 200 minutes/week = 3.3 hours/week wasted

The Solution

Write a failing test that reproduces the bug BEFORE asking the LLM to fix it.

This simple practice transforms debugging from an ambiguous iterative process into a concrete verification loop:

1. Write failing test that reproduces the bug
2. Ask LLM to fix the code
3. Run test
4. If test passes → Bug fixed ✅
   If test fails → LLM iterates (with clear error message)

Why This Works

1. Concrete Success Criteria

The test definitively proves when the bug is fixed:

// Before: Ambiguous verification
"The login still doesn't work"// After: Clear verification
test('should authenticate user with valid credentials', () => {
  const result = authenticateUser('[email protected]', 'password123');
  expect(result).toEqual({
    success: true,
    user: { email: '[email protected]' },
    sessionToken: expect.any(String),
  });
});

2. Forces Root Cause Thinking

Writing a test forces you to understand the bug’s root cause:

// Vague problem: "Login is broken"
// What does "broken" mean? Silent failure? Wrong error? Timeout?

// Test forces specificity:
test('should throw AuthenticationError when user not found', () => {
  expect(() => authenticateUser('[email protected]', 'pass'))
    .toThrow(AuthenticationError);
});

test('should not throw for network timeouts', async () => {
  mockApiTimeout();
  await expect(authenticateUser('[email protected]', 'pass'))
    .rejects.toThrow(NetworkError);
});

3. Prevents Regressions

The test remains in your suite, preventing the bug from returning:

// This test runs on every commit
test('regression: auth should handle race condition', async () => {
  // Simulate rapid concurrent requests
  const promises = Array.from({ length: 10 }, () =>
    authenticateUser('[email protected]', 'password123')
  );
  
  const results = await Promise.all(promises);
  
  // All should succeed without race condition errors
  results.forEach(result => {
    expect(result.success).toBe(true);
  });
});

Future code changes that reintroduce the bug will be caught immediately.

4. Enables Autonomous LLM Iteration

With a test, the LLM can iterate without human verification:

1. LLM attempts fix #1
2. Runs test → Fails with "TypeError: Cannot read property 'id' of null"
3. LLM analyzes error → Attempts fix #2
4. Runs test → Fails with "AuthenticationError: User not found"
5. LLM analyzes error → Attempts fix #3
6. Runs test → Passes ✅
7. Done

You’re only notified when the test passes, saving you 3-6 manual verification cycles.

Implementation

Step 1: Reproduce the Bug

Before writing a test, understand exactly how to reproduce the bug:

**Reproduction Steps**:
1. Navigate to /login
2. Enter email: [email protected]
3. Enter password: password123
4. Click "Sign In"
5. Observe: Error "Cannot read property 'id' of undefined"

**Expected**: User is authenticated and redirected to dashboard
**Actual**: Error thrown, login fails

Step 2: Write a Failing Test

Translate the reproduction steps into a test:

import { describe, test, expect, beforeEach } from 'vitest';
import { authenticateUser } from './auth';
import { mockDatabase } from './test-utils';

describe('Login Bug - Cannot read property id of undefined', () => {
  beforeEach(() => {
    // Setup: Create test user in database
    mockDatabase.users.create({
      email: '[email protected]',
      passwordHash: hashPassword('password123'),
    });
  });

  test('should authenticate user without throwing', async () => {
    // This currently fails with "Cannot read property 'id' of undefined"
    const result = await authenticateUser(
      '[email protected]',
      'password123'
    );

    expect(result).toBeDefined();
    expect(result.success).toBe(true);
    expect(result.user.email).toBe('[email protected]');
  });
});

Verify the test fails:

$ npm test auth.test.ts

❌ FAIL  Login Bug - Cannot read property id of undefined
  ✕ should authenticate user without throwing (24 ms)

  TypeError: Cannot read property 'id' of undefined
      at authenticateUser (auth.ts:42:18)

Good – The test reproduces the bug.

Step 3: Provide Test to LLM

Now give the LLM both the failing test and the buggy code:

I have a bug in the authentication system. Here's a failing test that reproduces it:

[Paste test code]

The test fails with: "TypeError: Cannot read property 'id' of undefined"

Here's the buggy code:

[Paste auth.ts]

Please fix the code so the test passes. Run the test after your changes to verify.

Step 4: LLM Iterates Until Test Passes

The LLM now has a clear target:

// LLM's fix attempt
async function authenticateUser(email: string, password: string) {
  const user = await db.users.findByEmail(email);
  
  // Fix: Add null check before accessing user.id
  if (!user) {
    throw new AuthenticationError('User not found');
  }
  
  const isValid = await verifyPassword(password, user.passwordHash);
  
  if (!isValid) {
    throw new AuthenticationError('Invalid password');
  }
  
  const session = await createSession(user.id); // ✅ Now safe
  
  return {
    success: true,
    user: { email: user.email },
    sessionToken: session.token,
  };
}

Run test:

$ npm test auth.test.ts

✅ PASS  Login Bug - Cannot read property id of undefined
  ✓ should authenticate user without throwing (18 ms)

Test Suites: 1 passed, 1 total
Tests:       1 passed, 1 total

Bug fixed – Test passes.

Step 5: Add Edge Case Tests

Once the basic fix works, add tests for edge cases:

describe('Authentication edge cases', () => {
  test('should handle non-existent user gracefully', async () => {
    await expect(
      authenticateUser('[email protected]', 'password')
    ).rejects.toThrow(AuthenticationError);
  });

  test('should handle incorrect password', async () => {
    await expect(
      authenticateUser('[email protected]', 'wrongpassword')
    ).rejects.toThrow(AuthenticationError);
  });

  test('should handle database connection errors', async () => {
    mockDatabase.simulateConnectionError();
    
    await expect(
      authenticateUser('[email protected]', 'password123')
    ).rejects.toThrow(DatabaseError);
  });
});

Ask the LLM to ensure all tests pass.

Real-World Example

Bug Report

Bug: API returns 500 error when user has no subscriptions

Reproduction:
1. Create user without subscriptions
2. GET /api/user/subscriptions
3. Observe: 500 Internal Server Error

Expected: 200 OK with empty array []

Step 1: Write Failing Test

import { describe, test, expect } from 'vitest';
import { getUserSubscriptions } from './subscriptions';
import { createMockUser } from './test-utils';

describe('Subscription Bug - 500 error for users without subscriptions', () => {
  test('should return empty array for user with no subscriptions', async () => {
    const user = await createMockUser({ subscriptions: [] });
    
    const result = await getUserSubscriptions(user.id);
    
    expect(result).toEqual({
      subscriptions: [],
      total: 0,
    });
  });
});

Run test:

❌ FAIL
  TypeError: Cannot read property 'map' of undefined
      at getUserSubscriptions (subscriptions.ts:12:25)

Step 2: Prompt LLM

Bug: getUserSubscriptions() crashes when user has no subscriptions.

Failing test:
[Paste test code]

Buggy code:
[Paste subscriptions.ts]

Fix the code so the test passes.

Step 3: LLM Fixes

// Before (buggy):
async function getUserSubscriptions(userId: string) {
  const user = await db.users.findById(userId);
  return {
    subscriptions: user.subscriptions.map(formatSubscription), // ❌ Crashes if undefined
    total: user.subscriptions.length,
  };
}

// After (fixed):
async function getUserSubscriptions(userId: string) {
  const user = await db.users.findById(userId);
  
  if (!user) {
    throw new NotFoundError('User not found');
  }
  
  const subscriptions = user.subscriptions || []; // ✅ Handle undefined/null
  
  return {
    subscriptions: subscriptions.map(formatSubscription),
    total: subscriptions.length,
  };
}

Run test:

✅ PASS
  ✓ should return empty array for user with no subscriptions (15 ms)

Step 4: Expand Test Coverage

describe('Subscription retrieval', () => {
  test('should return user subscriptions', async () => {
    const user = await createMockUser({
      subscriptions: [
        { plan: 'pro', status: 'active' },
        { plan: 'enterprise', status: 'active' },
      ],
    });
    
    const result = await getUserSubscriptions(user.id);
    
    expect(result.total).toBe(2);
    expect(result.subscriptions).toHaveLength(2);
  });

  test('should handle non-existent user', async () => {
    await expect(
      getUserSubscriptions('nonexistent-id')
    ).rejects.toThrow(NotFoundError);
  });
});

Best Practices

1. Test at the Right Level

Choose the appropriate test level for the bug:

// ❌ Too low-level (tests implementation details)
test('should call findByEmail with correct email', () => {
  authenticateUser('[email protected]', 'pass');
  expect(mockDb.findByEmail).toHaveBeenCalledWith('[email protected]');
});

// ✅ Right level (tests behavior)
test('should authenticate valid user', async () => {
  const result = await authenticateUser('[email protected]', 'pass');
  expect(result.success).toBe(true);
});

// ✅ Also good (integration test for complex bugs)
test('should authenticate user end-to-end', async () => {
  const response = await request(app)
    .post('/api/auth/login')
    .send({ email: '[email protected]', password: 'pass' });
  
  expect(response.status).toBe(200);
  expect(response.body.sessionToken).toBeDefined();
});

Rule of thumb: Test at the lowest level that reliably reproduces the bug.

2. Make Tests Deterministic

Avoid flaky tests that sometimes pass/fail:

// ❌ Flaky (depends on timing)
test('should complete within 100ms', async () => {
  const start = Date.now();
  await processData();
  expect(Date.now() - start).toBeLessThan(100);
});

// ✅ Deterministic (tests behavior, not timing)
test('should process all items', async () => {
  const result = await processData([1, 2, 3]);
  expect(result).toEqual([2, 4, 6]);
});

3. Include Error Messages in Tests

Error messages help the LLM understand what’s expected:

// ❌ Generic assertion
expect(result).toBe(true);

// ✅ Descriptive assertion
expect(result.success).toBe(true);
expect(result.user).toBeDefined();
expect(result.user.email).toBe('[email protected]');

4. Test the Fix, Not Just the Bug

Don’t just verify the error is gone – verify the correct behavior:

// ❌ Only tests that error is fixed
test('should not throw error', async () => {
  await expect(authenticateUser('[email protected]', 'pass'))
    .resolves.not.toThrow();
});

// ✅ Tests correct behavior
test('should return authenticated user', async () => {
  const result = await authenticateUser('[email protected]', 'pass');
  
  expect(result).toMatchObject({
    success: true,
    user: { email: '[email protected]' },
    sessionToken: expect.any(String),
  });
});

5. Keep Tests Focused

One test should verify one behavior:

// ❌ Tests too many things
test('authentication works', async () => {
  const result1 = await authenticateUser('[email protected]', 'pass');
  expect(result1.success).toBe(true);
  
  const result2 = await authenticateUser('[email protected]', 'pass');
  expect(result2.success).toBe(false);
  
  const result3 = await authenticateUser('[email protected]', 'wrong');
  expect(result3.success).toBe(false);
});

// ✅ Focused tests
test('should authenticate valid user', async () => {
  const result = await authenticateUser('[email protected]', 'pass');
  expect(result.success).toBe(true);
});

test('should reject non-existent user', async () => {
  await expect(authenticateUser('[email protected]', 'pass'))
    .rejects.toThrow(AuthenticationError);
});

test('should reject incorrect password', async () => {
  await expect(authenticateUser('[email protected]', 'wrong'))
    .rejects.toThrow(AuthenticationError);
});

Common Pitfalls

❌ Pitfall 1: Writing Tests After the Fix

Problem: Writing tests after fixing the bug doesn’t help reduce iterations.

// Wrong order:
1. Report bug to LLM
2. LLM attempts fix
3. Manually verify (3-5 iterations)
4. Bug fixed
5. Write test

// Correct order:
1. Write failing test
2. Report bug + test to LLM
3. LLM fixes until test passes (autonomous)
4. Bug fixed

❌ Pitfall 2: Vague Test Assertions

Problem: Tests that don’t clearly specify expected behavior:

// ❌ Too vague
test('login should work', async () => {
  const result = await login('[email protected]', 'pass');
  expect(result).toBeTruthy();
});

// ✅ Specific expectations
test('login should return user and session token', async () => {
  const result = await login('[email protected]', 'pass');
  
  expect(result.user).toMatchObject({
    email: '[email protected]',
    id: expect.any(String),
  });
  expect(result.sessionToken).toMatch(/^[a-zA-Z0-9-_]+$/);
  expect(result.expiresAt).toBeInstanceOf(Date);
});

❌ Pitfall 3: Testing Implementation Instead of Behavior

Problem: Tests break when refactoring, even if behavior is correct:

// ❌ Tests implementation
test('should call bcrypt.compare', async () => {
  await authenticateUser('[email protected]', 'pass');
  expect(bcrypt.compare).toHaveBeenCalled();
});

// ✅ Tests behavior
test('should authenticate user with valid password', async () => {
  const result = await authenticateUser('[email protected]', 'validPass');
  expect(result.success).toBe(true);
});

test('should reject user with invalid password', async () => {
  await expect(authenticateUser('[email protected]', 'wrongPass'))
    .rejects.toThrow(AuthenticationError);
});

❌ Pitfall 4: Skipping Edge Cases

Problem: Only testing the happy path leaves edge cases unverified:

// ❌ Only happy path
test('should authenticate user', async () => {
  const result = await authenticateUser('[email protected]', 'pass');
  expect(result.success).toBe(true);
});

// ✅ Covers edge cases
describe('Authentication', () => {
  test('should authenticate valid user', async () => {
    const result = await authenticateUser('[email protected]', 'pass');
    expect(result.success).toBe(true);
  });

  test('should handle empty email', async () => {
    await expect(authenticateUser('', 'pass'))
      .rejects.toThrow(ValidationError);
  });

  test('should handle empty password', async () => {
    await expect(authenticateUser('[email protected]', ''))
      .rejects.toThrow(ValidationError);
  });

  test('should handle malformed email', async () => {
    await expect(authenticateUser('not-an-email', 'pass'))
      .rejects.toThrow(ValidationError);
  });
});

Integration with Other Patterns

Combine with Actor-Critic Pattern

Use tests as the “critic” in an actor-critic loop:

Prompt:
"Fix this bug using actor-critic approach:

1. ACTOR: Generate a fix
2. CRITIC: Run the test and analyze failures
3. ACTOR: Refine based on test output
4. Repeat until test passes

Here's the failing test:
[test code]

Here's the buggy code:
[buggy code]"

Combine with Quality Gates

Tests become automatic quality gates in CI/CD:

# .github/workflows/ci.yml
name: CI

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
      - run: npm install
      - run: npm test # ← Regression tests run automatically

Every commit must pass regression tests, preventing bugs from returning.

Combine with Institutional Memory

Add regression tests to your learning files:

# ERRORS.md

## Bug: Authentication Race Condition (Fixed 2025-11-02)

**Symptom**: Login occasionally failed with "Cannot read property 'id' of undefined"

**Root Cause**: Race condition when concurrent requests accessed user object

**Regression Test**:
```typescript
test('regression: auth handles concurrent requests', async () => {
  const promises = Array.from({ length: 10 }, () =>
    authenticateUser('[email protected]', 'pass')
  );
  const results = await Promise.all(promises);
  results.forEach(r => expect(r.success).toBe(true));
});

Prevention: Always write regression test when fixing race conditions


## Measuring Success

### Key Metrics

**1. Iterations to Fix**

Track how many LLM iterations are needed:

Without test-first:

  • Average iterations: 5-8
  • Range: 3-12

With test-first:

  • Average iterations: 2-3
  • Range: 1-5

Improvement: 50-60% reduction


**2. Time to Fix**

Measure total time from bug report to verified fix:

Without test-first:

  • Average: 20-30 minutes
  • Includes: Multiple manual verifications, back-and-forth with LLM

With test-first:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course
  • Average: 8-12 minutes
  • Includes: Writing test (3-5 min) + LLM fix (5-7 min)

Improvement: 50-60% time savings


**3. Regression Rate**

Track how often bugs return:

Without regression tests:

  • 15-20% of bugs return within 6 months

With regression tests:

  • 0-2% of bugs return (only if test was inadequate)

Improvement: 90%+ reduction in regressions


## Conclusion

Test-based regression patching transforms debugging from an ambiguous, iterative process into a concrete, verifiable workflow.

**Key Principles**:

1. **Write the test first** - Before asking the LLM to fix anything
2. **Make it fail** - Verify the test reproduces the bug
3. **Give test to LLM** - Let it iterate until the test passes
4. **Expand coverage** - Add edge case tests once the basic fix works
5. **Keep the tests** - They prevent regressions forever

**The Impact**:

- **50%+ faster bug fixes**: From 20-30 minutes to 8-12 minutes
- **Autonomous iteration**: LLM fixes bugs without human verification
- **Zero regressions**: Tests prevent bugs from returning
- **Better code quality**: Tests force understanding of root causes

**When to Use**:

- ✅ Any reproducible bug
- ✅ Regressions (especially important to prevent recurrence)
- ✅ Edge cases discovered in production
- ✅ Bugs reported by users (write test from reproduction steps)
- ✅ Intermittent bugs (test helps isolate the condition)

**When NOT to Use**:

- ❌ Bugs you can't reproduce reliably
- ❌ Issues with external services (mocking required)
- ❌ UI bugs (use visual regression testing instead)
- ❌ Performance issues (use benchmarks instead)

By making test-based regression patching your default debugging workflow, you'll fix bugs faster, prevent regressions, and build a more robust codebase - all while letting the LLM do more of the heavy lifting.

## Related Concepts

- [Actor-Critic Adversarial Coding](./actor-critic-adversarial-coding.md) - Use tests as the critic in a generation loop
- [Quality Gates as Information Filters](./quality-gates-as-information-filters.md) - Tests filter out invalid solutions
- [Verification Sandwich Pattern](./verification-sandwich-pattern.md) - Establish clean baseline before and after changes
- [Compounding Effects of Quality Gates](./compounding-effects-quality-gates.md) - How stacked gates multiply quality improvements
- [Claude Code Hooks as Quality Gates](./claude-code-hooks-quality-gates.md) - Automate test execution on every code change
- [Test-Driven Prompting](./test-driven-prompting.md) - Write tests before generating code to constrain LLM output
- [Integration Testing Patterns](./integration-testing-patterns.md) - Integration tests catch more regression bugs
- [Test Custom Infrastructure](./test-custom-infrastructure.md) - Test your testing infrastructure to avoid cascading failures
- [Property-Based Testing for LLM-Generated Code](./property-based-testing.md) - Catch edge cases automatically with invariants
- [Automated Flaky Test Detection](./flaky-test-diagnosis-script.md) - Diagnose intermittent test failures systematically
- [Early Linting Prevents Ratcheting](./early-linting-prevents-ratcheting.md) - Catch issues early before they compound
- [Trust But Verify Protocol](./trust-but-verify-protocol.md) - Tests provide the verification layer
- [Institutional Memory Learning Files](./institutional-memory-learning-files.md) - Document regression tests in ERRORS.md
- [Error Messages as Training Data](./error-messages-as-training.md) - Track recurring errors and link to regression tests
- [Five-Point Error Diagnostic Framework](./five-point-error-diagnostic-framework.md) - Diagnose root cause before writing regression test
- [Context Debugging Framework](./context-debugging-framework.md) - Tests provide concrete verification targets for Layer 4 debugging
- [Prevention Protocol](./prevention-protocol.md) - Tests are a key prevention measure after bug fixes
- [Clean Slate Trajectory Recovery](./clean-slate-trajectory-recovery.md) - Use failing tests as constraints in fresh sessions

## References

- [Vitest Testing Framework](https://vitest.dev/) - Fast unit test framework for TypeScript/JavaScript
- [Jest Documentation](https://jestjs.io/docs/getting-started) - Popular testing framework with extensive mocking capabilities

Topics
Automated TestingBug FixingDebuggingIterative DevelopmentLlm VerificationQuality GatesRegression TestsTddTest FirstTesting

More Insights

Cover Image for LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

Three patterns that turn agent pipelines from opaque prompt chains into debuggable, reproducible engineering systems: (1) an LLM VCR that records and replays model interactions, (2) a Run > Step > Mes

James Phoenix
James Phoenix
Cover Image for Agent Search Observation Loop: Learning What Context to Provide

Agent Search Observation Loop: Learning What Context to Provide

Watch how the agent navigates your codebase. What it searches for tells you what to hand it next time.

James Phoenix
James Phoenix