Test-Based Regression Patching: 50% Faster Bug Fixes

James Phoenix
James Phoenix

Summary

Write a failing test that reproduces the bug before asking an LLM to fix it. This reduces fix iterations by 50%+ by giving the LLM a concrete verification target and preventing regressions. The test becomes a quality gate that definitively proves when the bug is fixed.

The Problem

When debugging with LLMs, developers often iterate multiple times on broken solutions because the LLM doesn’t have a clear, verifiable target. Without a test, you rely on manual verification, which is slow and error-prone. The LLM may think it fixed the bug when it only addressed symptoms, leading to 5-10+ iterations.

The Solution

Before asking the LLM to fix a bug, write a failing test that reproduces it. This test becomes the success criteria – when it passes, the bug is definitively fixed. The LLM can iterate autonomously until the test passes, reducing human verification overhead from 5-10 manual checks to a single test run.

The Problem

When debugging with AI coding agents, you often encounter this frustrating cycle:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course
1. User: "Fix the login bug - users can't sign in"
2. LLM: "I've fixed it by updating the auth handler"
3. User tests: Still broken
4. User: "Still not working, I see error X"
5. LLM: "Try this updated version"
6. User tests: Different error
7. User: "Now I get error Y"
8. LLM: "Here's another fix"
9. User tests: Back to original error
10. (Repeat 3-7 more times...)

This iterative debugging loop wastes time and creates frustration because:

1. No Clear Success Criteria

The LLM doesn’t have a programmatic way to verify its fix worked. It relies on your manual testing and feedback, which is:

  • Slow: Each iteration requires manual testing
  • Ambiguous: “Still broken” doesn’t tell the LLM what’s wrong
  • Error-prone: You might miss edge cases or regression bugs
  • Exhausting: 5-10 iterations burn developer time and energy

2. Symptom vs. Root Cause

Without a test, the LLM often fixes symptoms rather than root causes:

// Bug: User authentication fails silently

// ❌ Symptom fix (doesn't solve root cause)
if (!user) {
  console.log('No user found'); // Added logging
  return null;
}

// ✅ Root cause fix (test would catch this)
if (!user) {
  throw new AuthenticationError('User not found', { userId });
}

The symptom fix might appear to work in basic testing, but the underlying issue (silent failures) remains.

3. Regression Risk

When fixing bugs without tests, you risk introducing new bugs:

// Original bug: Race condition in async handler
async function handleLogin(email: string) {
  const user = await findUser(email);
  const session = await createSession(user.id);
  return session;
}

// LLM's fix (solves race condition but breaks error handling)
async function handleLogin(email: string) {
  const user = await findUser(email);
  if (!user) return null; // ❌ Silent failure introduced
  const session = await createSession(user.id);
  return session;
}

Without tests for existing behavior, you don’t know if the fix broke something else.

The Cost

For a typical bug fix:

Manual debugging: 5-10 iterations × 2-3 minutes = 10-30 minutes
With frustration overhead: 15-45 minutes per bug

For a team fixing 10 bugs/week:
10 bugs × 20 min = 200 minutes/week = 3.3 hours/week wasted

The Solution

Write a failing test that reproduces the bug BEFORE asking the LLM to fix it.

This simple practice transforms debugging from an ambiguous iterative process into a concrete verification loop:

1. Write failing test that reproduces the bug
2. Ask LLM to fix the code
3. Run test
4. If test passes → Bug fixed ✅
   If test fails → LLM iterates (with clear error message)

Why This Works

1. Concrete Success Criteria

The test definitively proves when the bug is fixed:

// Before: Ambiguous verification
"The login still doesn't work"// After: Clear verification
test('should authenticate user with valid credentials', () => {
  const result = authenticateUser('[email protected]', 'password123');
  expect(result).toEqual({
    success: true,
    user: { email: '[email protected]' },
    sessionToken: expect.any(String),
  });
}); ✅

2. Forces Root Cause Thinking

Writing a test forces you to understand the bug’s root cause:

// Vague problem: "Login is broken"
// What does "broken" mean? Silent failure? Wrong error? Timeout?

// Test forces specificity:
test('should throw AuthenticationError when user not found', () => {
  expect(() => authenticateUser('[email protected]', 'pass'))
    .toThrow(AuthenticationError);
});

test('should not throw for network timeouts', async () => {
  mockApiTimeout();
  await expect(authenticateUser('[email protected]', 'pass'))
    .rejects.toThrow(NetworkError);
});

3. Prevents Regressions

The test remains in your suite, preventing the bug from returning:

// This test runs on every commit
test('regression: auth should handle race condition', async () => {
  // Simulate rapid concurrent requests
  const promises = Array.from({ length: 10 }, () =>
    authenticateUser('[email protected]', 'password123')
  );
  
  const results = await Promise.all(promises);
  
  // All should succeed without race condition errors
  results.forEach(result => {
    expect(result.success).toBe(true);
  });
});

Future code changes that reintroduce the bug will be caught immediately.

4. Enables Autonomous LLM Iteration

With a test, the LLM can iterate without human verification:

1. LLM attempts fix #1
2. Runs test → Fails with "TypeError: Cannot read property 'id' of null"
3. LLM analyzes error → Attempts fix #2
4. Runs test → Fails with "AuthenticationError: User not found"
5. LLM analyzes error → Attempts fix #3
6. Runs test → Passes ✅
7. Done

You’re only notified when the test passes, saving you 3-6 manual verification cycles.

Implementation

Step 1: Reproduce the Bug

Before writing a test, understand exactly how to reproduce the bug:

**Reproduction Steps**:
1. Navigate to /login
2. Enter email: [email protected]
3. Enter password: password123
4. Click "Sign In"
5. Observe: Error "Cannot read property 'id' of undefined"

**Expected**: User is authenticated and redirected to dashboard
**Actual**: Error thrown, login fails

Step 2: Write a Failing Test

Translate the reproduction steps into a test:

import { describe, test, expect, beforeEach } from 'vitest';
import { authenticateUser } from './auth';
import { mockDatabase } from './test-utils';

describe('Login Bug - Cannot read property id of undefined', () => {
  beforeEach(() => {
    // Setup: Create test user in database
    mockDatabase.users.create({
      email: '[email protected]',
      passwordHash: hashPassword('password123'),
    });
  });

  test('should authenticate user without throwing', async () => {
    // This currently fails with "Cannot read property 'id' of undefined"
    const result = await authenticateUser(
      '[email protected]',
      'password123'
    );

    expect(result).toBeDefined();
    expect(result.success).toBe(true);
    expect(result.user.email).toBe('[email protected]');
  });
});

Verify the test fails:

$ npm test auth.test.ts

❌ FAIL  Login Bug - Cannot read property id of undefined
  ✕ should authenticate user without throwing (24 ms)

  TypeError: Cannot read property 'id' of undefined
      at authenticateUser (auth.ts:42:18)

Good – The test reproduces the bug.

Step 3: Provide Test to LLM

Now give the LLM both the failing test and the buggy code:

I have a bug in the authentication system. Here's a failing test that reproduces it:

[Paste test code]

The test fails with: "TypeError: Cannot read property 'id' of undefined"

Here's the buggy code:

[Paste auth.ts]

Please fix the code so the test passes. Run the test after your changes to verify.

Step 4: LLM Iterates Until Test Passes

The LLM now has a clear target:

// LLM's fix attempt
async function authenticateUser(email: string, password: string) {
  const user = await db.users.findByEmail(email);
  
  // Fix: Add null check before accessing user.id
  if (!user) {
    throw new AuthenticationError('User not found');
  }
  
  const isValid = await verifyPassword(password, user.passwordHash);
  
  if (!isValid) {
    throw new AuthenticationError('Invalid password');
  }
  
  const session = await createSession(user.id); // ✅ Now safe
  
  return {
    success: true,
    user: { email: user.email },
    sessionToken: session.token,
  };
}

Run test:

$ npm test auth.test.ts

✅ PASS  Login Bug - Cannot read property id of undefined
  ✓ should authenticate user without throwing (18 ms)

Test Suites: 1 passed, 1 total
Tests:       1 passed, 1 total

Bug fixed – Test passes.

Step 5: Add Edge Case Tests

Once the basic fix works, add tests for edge cases:

describe('Authentication edge cases', () => {
  test('should handle non-existent user gracefully', async () => {
    await expect(
      authenticateUser('[email protected]', 'password')
    ).rejects.toThrow(AuthenticationError);
  });

  test('should handle incorrect password', async () => {
    await expect(
      authenticateUser('[email protected]', 'wrongpassword')
    ).rejects.toThrow(AuthenticationError);
  });

  test('should handle database connection errors', async () => {
    mockDatabase.simulateConnectionError();
    
    await expect(
      authenticateUser('[email protected]', 'password123')
    ).rejects.toThrow(DatabaseError);
  });
});

Ask the LLM to ensure all tests pass.

Real-World Example

Bug Report

Bug: API returns 500 error when user has no subscriptions

Reproduction:
1. Create user without subscriptions
2. GET /api/user/subscriptions
3. Observe: 500 Internal Server Error

Expected: 200 OK with empty array []

Step 1: Write Failing Test

import { describe, test, expect } from 'vitest';
import { getUserSubscriptions } from './subscriptions';
import { createMockUser } from './test-utils';

describe('Subscription Bug - 500 error for users without subscriptions', () => {
  test('should return empty array for user with no subscriptions', async () => {
    const user = await createMockUser({ subscriptions: [] });
    
    const result = await getUserSubscriptions(user.id);
    
    expect(result).toEqual({
      subscriptions: [],
      total: 0,
    });
  });
});

Run test:

❌ FAIL
  TypeError: Cannot read property 'map' of undefined
      at getUserSubscriptions (subscriptions.ts:12:25)

Step 2: Prompt LLM

Bug: getUserSubscriptions() crashes when user has no subscriptions.

Failing test:
[Paste test code]

Buggy code:
[Paste subscriptions.ts]

Fix the code so the test passes.

Step 3: LLM Fixes

// Before (buggy):
async function getUserSubscriptions(userId: string) {
  const user = await db.users.findById(userId);
  return {
    subscriptions: user.subscriptions.map(formatSubscription), // ❌ Crashes if undefined
    total: user.subscriptions.length,
  };
}

// After (fixed):
async function getUserSubscriptions(userId: string) {
  const user = await db.users.findById(userId);
  
  if (!user) {
    throw new NotFoundError('User not found');
  }
  
  const subscriptions = user.subscriptions || []; // ✅ Handle undefined/null
  
  return {
    subscriptions: subscriptions.map(formatSubscription),
    total: subscriptions.length,
  };
}

Run test:

✅ PASS
  ✓ should return empty array for user with no subscriptions (15 ms)

Step 4: Expand Test Coverage

describe('Subscription retrieval', () => {
  test('should return user subscriptions', async () => {
    const user = await createMockUser({
      subscriptions: [
        { plan: 'pro', status: 'active' },
        { plan: 'enterprise', status: 'active' },
      ],
    });
    
    const result = await getUserSubscriptions(user.id);
    
    expect(result.total).toBe(2);
    expect(result.subscriptions).toHaveLength(2);
  });

  test('should handle non-existent user', async () => {
    await expect(
      getUserSubscriptions('nonexistent-id')
    ).rejects.toThrow(NotFoundError);
  });
});

Best Practices

1. Test at the Right Level

Choose the appropriate test level for the bug:

// ❌ Too low-level (tests implementation details)
test('should call findByEmail with correct email', () => {
  authenticateUser('[email protected]', 'pass');
  expect(mockDb.findByEmail).toHaveBeenCalledWith('[email protected]');
});

// ✅ Right level (tests behavior)
test('should authenticate valid user', async () => {
  const result = await authenticateUser('[email protected]', 'pass');
  expect(result.success).toBe(true);
});

// ✅ Also good (integration test for complex bugs)
test('should authenticate user end-to-end', async () => {
  const response = await request(app)
    .post('/api/auth/login')
    .send({ email: '[email protected]', password: 'pass' });
  
  expect(response.status).toBe(200);
  expect(response.body.sessionToken).toBeDefined();
});

Rule of thumb: Test at the lowest level that reliably reproduces the bug.

2. Make Tests Deterministic

Avoid flaky tests that sometimes pass/fail:

// ❌ Flaky (depends on timing)
test('should complete within 100ms', async () => {
  const start = Date.now();
  await processData();
  expect(Date.now() - start).toBeLessThan(100);
});

// ✅ Deterministic (tests behavior, not timing)
test('should process all items', async () => {
  const result = await processData([1, 2, 3]);
  expect(result).toEqual([2, 4, 6]);
});

3. Include Error Messages in Tests

Error messages help the LLM understand what’s expected:

// ❌ Generic assertion
expect(result).toBe(true);

// ✅ Descriptive assertion
expect(result.success).toBe(true);
expect(result.user).toBeDefined();
expect(result.user.email).toBe('[email protected]');

4. Test the Fix, Not Just the Bug

Don’t just verify the error is gone – verify the correct behavior:

// ❌ Only tests that error is fixed
test('should not throw error', async () => {
  await expect(authenticateUser('[email protected]', 'pass'))
    .resolves.not.toThrow();
});

// ✅ Tests correct behavior
test('should return authenticated user', async () => {
  const result = await authenticateUser('[email protected]', 'pass');
  
  expect(result).toMatchObject({
    success: true,
    user: { email: '[email protected]' },
    sessionToken: expect.any(String),
  });
});

5. Keep Tests Focused

One test should verify one behavior:

// ❌ Tests too many things
test('authentication works', async () => {
  const result1 = await authenticateUser('[email protected]', 'pass');
  expect(result1.success).toBe(true);
  
  const result2 = await authenticateUser('[email protected]', 'pass');
  expect(result2.success).toBe(false);
  
  const result3 = await authenticateUser('[email protected]', 'wrong');
  expect(result3.success).toBe(false);
});

// ✅ Focused tests
test('should authenticate valid user', async () => {
  const result = await authenticateUser('[email protected]', 'pass');
  expect(result.success).toBe(true);
});

test('should reject non-existent user', async () => {
  await expect(authenticateUser('[email protected]', 'pass'))
    .rejects.toThrow(AuthenticationError);
});

test('should reject incorrect password', async () => {
  await expect(authenticateUser('[email protected]', 'wrong'))
    .rejects.toThrow(AuthenticationError);
});

Common Pitfalls

❌ Pitfall 1: Writing Tests After the Fix

Problem: Writing tests after fixing the bug doesn’t help reduce iterations.

// Wrong order:
1. Report bug to LLM
2. LLM attempts fix
3. Manually verify (3-5 iterations)
4. Bug fixed
5. Write test

// Correct order:
1. Write failing test
2. Report bug + test to LLM
3. LLM fixes until test passes (autonomous)
4. Bug fixed

❌ Pitfall 2: Vague Test Assertions

Problem: Tests that don’t clearly specify expected behavior:

// ❌ Too vague
test('login should work', async () => {
  const result = await login('[email protected]', 'pass');
  expect(result).toBeTruthy();
});

// ✅ Specific expectations
test('login should return user and session token', async () => {
  const result = await login('[email protected]', 'pass');
  
  expect(result.user).toMatchObject({
    email: '[email protected]',
    id: expect.any(String),
  });
  expect(result.sessionToken).toMatch(/^[a-zA-Z0-9-_]+$/);
  expect(result.expiresAt).toBeInstanceOf(Date);
});

❌ Pitfall 3: Testing Implementation Instead of Behavior

Problem: Tests break when refactoring, even if behavior is correct:

// ❌ Tests implementation
test('should call bcrypt.compare', async () => {
  await authenticateUser('[email protected]', 'pass');
  expect(bcrypt.compare).toHaveBeenCalled();
});

// ✅ Tests behavior
test('should authenticate user with valid password', async () => {
  const result = await authenticateUser('[email protected]', 'validPass');
  expect(result.success).toBe(true);
});

test('should reject user with invalid password', async () => {
  await expect(authenticateUser('[email protected]', 'wrongPass'))
    .rejects.toThrow(AuthenticationError);
});

❌ Pitfall 4: Skipping Edge Cases

Problem: Only testing the happy path leaves edge cases unverified:

// ❌ Only happy path
test('should authenticate user', async () => {
  const result = await authenticateUser('[email protected]', 'pass');
  expect(result.success).toBe(true);
});

// ✅ Covers edge cases
describe('Authentication', () => {
  test('should authenticate valid user', async () => {
    const result = await authenticateUser('[email protected]', 'pass');
    expect(result.success).toBe(true);
  });

  test('should handle empty email', async () => {
    await expect(authenticateUser('', 'pass'))
      .rejects.toThrow(ValidationError);
  });

  test('should handle empty password', async () => {
    await expect(authenticateUser('[email protected]', ''))
      .rejects.toThrow(ValidationError);
  });

  test('should handle malformed email', async () => {
    await expect(authenticateUser('not-an-email', 'pass'))
      .rejects.toThrow(ValidationError);
  });
});

Integration with Other Patterns

Combine with Actor-Critic Pattern

Use tests as the “critic” in an actor-critic loop:

Prompt:
"Fix this bug using actor-critic approach:

1. ACTOR: Generate a fix
2. CRITIC: Run the test and analyze failures
3. ACTOR: Refine based on test output
4. Repeat until test passes

Here's the failing test:
[test code]

Here's the buggy code:
[buggy code]"

Combine with Quality Gates

Tests become automatic quality gates in CI/CD:

# .github/workflows/ci.yml
name: CI

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
      - run: npm install
      - run: npm test # ← Regression tests run automatically

Every commit must pass regression tests, preventing bugs from returning.

Combine with Institutional Memory

Add regression tests to your learning files:

# ERRORS.md

## Bug: Authentication Race Condition (Fixed 2025-11-02)

**Symptom**: Login occasionally failed with "Cannot read property 'id' of undefined"

**Root Cause**: Race condition when concurrent requests accessed user object

**Regression Test**:
```typescript
test('regression: auth handles concurrent requests', async () => {
  const promises = Array.from({ length: 10 }, () =>
    authenticateUser('[email protected]', 'pass')
  );
  const results = await Promise.all(promises);
  results.forEach(r => expect(r.success).toBe(true));
});

Prevention: Always write regression test when fixing race conditions


## Measuring Success

### Key Metrics

**1. Iterations to Fix**

Track how many LLM iterations are needed:

Without test-first:

  • Average iterations: 5-8
  • Range: 3-12

With test-first:

  • Average iterations: 2-3
  • Range: 1-5

Improvement: 50-60% reduction


**2. Time to Fix**

Measure total time from bug report to verified fix:

Without test-first:

  • Average: 20-30 minutes
  • Includes: Multiple manual verifications, back-and-forth with LLM

With test-first:

  • Average: 8-12 minutes
  • Includes: Writing test (3-5 min) + LLM fix (5-7 min)

Improvement: 50-60% time savings


**3. Regression Rate**

Track how often bugs return:

Without regression tests:

  • 15-20% of bugs return within 6 months

With regression tests:

  • 0-2% of bugs return (only if test was inadequate)

Improvement: 90%+ reduction in regressions


## Conclusion

Test-based regression patching transforms debugging from an ambiguous, iterative process into a concrete, verifiable workflow.

**Key Principles**:

1. **Write the test first** - Before asking the LLM to fix anything
2. **Make it fail** - Verify the test reproduces the bug
3. **Give test to LLM** - Let it iterate until the test passes
4. **Expand coverage** - Add edge case tests once the basic fix works
5. **Keep the tests** - They prevent regressions forever

**The Impact**:

- **50%+ faster bug fixes**: From 20-30 minutes to 8-12 minutes
- **Autonomous iteration**: LLM fixes bugs without human verification
- **Zero regressions**: Tests prevent bugs from returning
- **Better code quality**: Tests force understanding of root causes

**When to Use**:

- ✅ Any reproducible bug
- ✅ Regressions (especially important to prevent recurrence)
- ✅ Edge cases discovered in production
- ✅ Bugs reported by users (write test from reproduction steps)
- ✅ Intermittent bugs (test helps isolate the condition)

**When NOT to Use**:

- ❌ Bugs you can't reproduce reliably
- ❌ Issues with external services (mocking required)
- ❌ UI bugs (use visual regression testing instead)
- ❌ Performance issues (use benchmarks instead)

By making test-based regression patching your default debugging workflow, you'll fix bugs faster, prevent regressions, and build a more robust codebase - all while letting the LLM do more of the heavy lifting.

## Related Concepts

- [Actor-Critic Adversarial Coding](./actor-critic-adversarial-coding.md) - Use tests as the critic in a generation loop
- [Quality Gates as Information Filters](./quality-gates-as-information-filters.md) - Tests filter out invalid solutions
- [Verification Sandwich Pattern](./verification-sandwich-pattern.md) - Establish clean baseline before and after changes
- [Compounding Effects of Quality Gates](./compounding-effects-quality-gates.md) - How stacked gates multiply quality improvements
- [Claude Code Hooks as Quality Gates](./claude-code-hooks-quality-gates.md) - Automate test execution on every code change
- [Test-Driven Prompting](./test-driven-prompting.md) - Write tests before generating code to constrain LLM output
- [Integration Testing Patterns](./integration-testing-patterns.md) - Integration tests catch more regression bugs
- [Test Custom Infrastructure](./test-custom-infrastructure.md) - Test your testing infrastructure to avoid cascading failures
- [Property-Based Testing for LLM-Generated Code](./property-based-testing.md) - Catch edge cases automatically with invariants
- [Automated Flaky Test Detection](./flaky-test-diagnosis-script.md) - Diagnose intermittent test failures systematically
- [Early Linting Prevents Ratcheting](./early-linting-prevents-ratcheting.md) - Catch issues early before they compound
- [Trust But Verify Protocol](./trust-but-verify-protocol.md) - Tests provide the verification layer
- [Institutional Memory Learning Files](./institutional-memory-learning-files.md) - Document regression tests in ERRORS.md
- [Error Messages as Training Data](./error-messages-as-training.md) - Track recurring errors and link to regression tests
- [Five-Point Error Diagnostic Framework](./five-point-error-diagnostic-framework.md) - Diagnose root cause before writing regression test
- [Context Debugging Framework](./context-debugging-framework.md) - Tests provide concrete verification targets for Layer 4 debugging
- [Prevention Protocol](./prevention-protocol.md) - Tests are a key prevention measure after bug fixes
- [Clean Slate Trajectory Recovery](./clean-slate-trajectory-recovery.md) - Use failing tests as constraints in fresh sessions

## References

- [Vitest Testing Framework](https://vitest.dev/) - Fast unit test framework for TypeScript/JavaScript
- [Jest Documentation](https://jestjs.io/docs/getting-started) - Popular testing framework with extensive mocking capabilities

Topics
Automated TestingBug FixingDebuggingIterative DevelopmentLlm VerificationQuality GatesRegression TestsTddTest FirstTesting

More Insights

Cover Image for Thought Leaders

Thought Leaders

People to follow for compound engineering, context engineering, and AI agent development.

James Phoenix
James Phoenix
Cover Image for Systems Thinking & Observability

Systems Thinking & Observability

Software should be treated as a measurable dynamical system, not as a collection of features.

James Phoenix
James Phoenix