Verification Sandwich Pattern: Always Know Your Baseline

James Phoenix
James Phoenix

Summary

LLMs generate code without knowing if the current state is clean, leading to confusion about whether failures are new or pre-existing. The verification sandwich pattern solves this by running all quality gates before and after generation, establishing a clean baseline and making it obvious when new issues are introduced.

The Problem

LLMs generate code without knowing if the current state is clean. When tests fail after generation, it’s unclear if the LLM broke something or if tests were already failing. This ambiguity wastes time debugging pre-existing issues instead of focusing on new changes. Without a baseline, every failure looks like a new problem.

The Solution

Always sandwich code generation between two verification steps: (1) Pre-Verification establishes a clean baseline by running all quality gates before making changes, (2) Generation makes the code changes, (3) Post-Verification runs the same gates again to detect only new issues. This pattern makes it obvious what changed and eliminates debugging of pre-existing failures.

The Problem: Ambiguous Failures

Imagine asking an LLM to add a new feature. It generates code, you run the tests, and 3 tests fail.

Question: Did the LLM break something, or were those tests already failing?

Without knowing the baseline state, you can’t tell. This leads to:

  • Wasted debugging time investigating pre-existing failures
  • False blame on the LLM for issues it didn’t cause
  • Missed regressions when new failures are hidden among old ones
  • Uncertainty about whether it’s safe to merge

Real-World Example

# You ask the LLM to add user authentication
$ claude "Add user authentication to the API"

# LLM generates code
# You run tests
$ npm test

FAILED:
  - user.test.ts:45 - "should hash passwords"
  - user.test.ts:67 - "should validate email format"
  - auth.test.ts:23 - "should return 401 for invalid token"

# Now what?
# Were these tests failing before?
# Did the LLM break them?
# Are they related to authentication at all?

You have no idea what changed. The only way to know is to:

  1. Revert the LLM’s changes
  2. Run tests again
  3. Compare the results

This takes 5-10 minutes every time you generate code.

The Solution: Verification Sandwich

The verification sandwich pattern eliminates ambiguity by always knowing your baseline.

┌─────────────────────────────────────┐
│  1. PRE-VERIFICATION (Baseline)     │
│     ├─ Run tests → All pass ✓       │
│     ├─ Run type check → Clean ✓     │
│     └─ Run linter → Clean ✓         │
├─────────────────────────────────────┤
│  2. GENERATION                      │
│     └─ Make the code change         │
├─────────────────────────────────────┤
│  3. POST-VERIFICATION (Delta)       │
│     ├─ Run tests → Detect failures  │
│     ├─ Run type check → Find errors │
│     └─ Run linter → Catch issues    │
└─────────────────────────────────────┘

The Key Insight

If pre-verification fails, STOP immediately. Don’t generate code on top of a broken baseline.

This forces you to:

  1. Fix existing issues first
  2. Establish a clean state
  3. Only then make new changes

Result: Post-verification failures are guaranteed to be from the new changes.

Implementation

Step 1: Define Your Quality Gates

A quality gate is any automated check that verifies correctness:

# Common quality gates
npm test              # Unit & integration tests
npm run type-check    # TypeScript type checking
npm run lint          # ESLint
npm run format:check  # Prettier
npm run build         # Compilation

Choose gates that are:

  • Fast (< 30 seconds total)
  • Deterministic (same input → same output)
  • Comprehensive (cover most common errors)

Step 2: Create a Verification Script

#!/bin/bash
# scripts/verify.sh

set -e  # Exit on any failure

echo "🔍 Running quality gates..."

echo "  ├─ Type checking..."
npm run type-check

echo "  ├─ Linting..."
npm run lint

echo "  ├─ Testing..."
npm test

echo "  └─ Building..."
npm run build

echo "✅ All quality gates passed!"

Key details:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course
  • Use set -e to stop on first failure
  • Run fast checks first (type-check before tests)
  • Provide clear output showing progress

Step 3: Pre-Verification Hook

Use Claude Code hooks to automatically run verification before generation:

// .claude/config.json
{
  "hooks": {
    "pre-request": "./scripts/verify.sh"
  }
}

Now, every time you ask Claude to generate code:

# You run:
$ claude "Add user authentication"

# Automatically runs first:
$ ./scripts/verify.sh
🔍 Running quality gates...
  ├─ Type checking... ✓
  ├─ Linting... ✓
  ├─ Testing... ✓
  └─ Building... ✓
✅ All quality gates passed!

# Only then does generation happen

If verification fails, the hook blocks the request:

$ claude "Add user authentication"

$ ./scripts/verify.sh
🔍 Running quality gates...
  ├─ Type checking... ✓
  ├─ Linting... ✓
  ├─ Testing... ✗
    FAILED: user.test.ts:45 - "should hash passwords"

❌ Quality gates failed. Fix issues before generating new code.

[Request blocked]

This forces you to fix the failing test before proceeding.

Step 4: Post-Verification Hook

After generation, automatically run verification again:

// .claude/config.json
{
  "hooks": {
    "pre-request": "./scripts/verify.sh",
    "post-request": "./scripts/verify.sh"
  }
}

Now the full workflow is:

$ claude "Add user authentication"

# 1. Pre-verification
🔍 Running quality gates...
  ✓ All gates pass

# 2. Generation
📝 Adding authentication...
   ├─ Created src/auth.ts
   ├─ Updated src/api.ts
   └─ Added tests in auth.test.ts

# 3. Post-verification
🔍 Running quality gates...
  ├─ Type checking... ✓
  ├─ Linting... ✓
  ├─ Testing... ✗
    FAILED: auth.test.ts:23 - "should return 401 for invalid token"
  └─ Building... (skipped)

❌ New issues introduced:
  - auth.test.ts:23

Key insight: Because pre-verification passed, you know this failure is from the new code.

Step 5: Manual Verification (No Hooks)

If you’re not using hooks, manually run the pattern:

# 1. Pre-verification
$ npm run verify
✅ All quality gates passed!

# 2. Generation
$ claude "Add user authentication"
# ... generates code ...

# 3. Post-verification
$ npm run verify
❌ Tests failed: auth.test.ts:23

This is less automated but still effective.

Advanced Patterns

Pattern 1: Selective Verification

For large codebases, running all tests is slow. Use targeted verification:

#!/bin/bash
# scripts/verify-targeted.sh

set -e

# Get changed files
CHANGED_FILES=$(git diff --name-only HEAD)

if echo "$CHANGED_FILES" | grep -q "src/auth"; then
  echo "🔍 Running auth tests..."
  npm test -- --testPathPattern=auth
else
  echo "🔍 Running all tests..."
  npm test
fi

Pattern 2: Progressive Verification

Run fast checks first, skip slow checks if fast ones fail:

#!/bin/bash
# scripts/verify-progressive.sh

set -e

echo "⚡ Fast checks..."
npm run type-check  # 2 seconds
npm run lint        # 3 seconds

echo "🧪 Running tests (this may take a while)..."
npm test            # 30 seconds

echo "🏗️  Building..."
npm run build       # 10 seconds

If type-check fails (2 seconds), you don’t waste 40 seconds on tests and build.

Pattern 3: Parallel Verification

Run independent checks in parallel:

#!/bin/bash
# scripts/verify-parallel.sh

set -e

echo "🔍 Running quality gates in parallel..."

# Run checks in parallel
npm run type-check &
PID_TYPECHECK=$!

npm run lint &
PID_LINT=$!

npm test &
PID_TEST=$!

# Wait for all to complete
wait $PID_TYPECHECK || exit 1
wait $PID_LINT || exit 1
wait $PID_TEST || exit 1

echo "✅ All quality gates passed!"

This cuts verification time from 35s to 30s (limited by slowest check).

Pattern 4: Verification with Context

Save pre-verification results for comparison:

#!/bin/bash
# scripts/verify-with-context.sh

set -e

# Run tests and save results
npm test 2>&1 | tee test-results.txt

# Count failures
FAILURES=$(grep -c "FAILED" test-results.txt || true)

if [ "$FAILURES" -gt 0 ]; then
  echo "❌ $FAILURES test(s) failed"
  exit 1
else
  echo "✅ All tests passed"
fi

Then after generation:

# Compare before/after
$ diff test-results-before.txt test-results-after.txt

Real-World Example

Scenario: Adding a new API endpoint

# 1. Pre-verification
$ ./scripts/verify.sh
🔍 Running quality gates...
  ├─ Type checking... ✓ (0 errors)
  ├─ Linting... ✓ (0 warnings)
  ├─ Testing... ✓ (124 passed)
  └─ Building... ✓
✅ All quality gates passed!

# 2. Generation
$ claude "Add GET /api/users/:id endpoint"

📝 Adding endpoint...
   ├─ Created src/api/users.ts
   ├─ Updated src/api/routes.ts
   └─ Added tests in users.test.ts

# 3. Post-verification
$ ./scripts/verify.sh
🔍 Running quality gates...
  ├─ Type checking... ✗ (1 error)
    src/api/users.ts:15:20 - Property 'id' does not exist on type 'Request'
  └─ (remaining checks skipped)

❌ New issues introduced:
  - Type error in src/api/users.ts:15

# 4. Fix the issue
$ claude "Fix the type error in users.ts"

📝 Fixing type error...
   └─ Updated src/api/users.ts (use req.params.id)

# 5. Post-verification (automatic)
$ ./scripts/verify.sh
🔍 Running quality gates...
  ├─ Type checking... ✓ (0 errors)
  ├─ Linting... ✓ (0 warnings)
  ├─ Testing... ✓ (125 passed) [+1 new test]
  └─ Building... ✓
✅ All quality gates passed!

Result: You know exactly what changed at each step:

  • After first generation: 1 type error introduced
  • After fix: Error resolved, all gates pass

When NOT to Use

The verification sandwich pattern isn’t always necessary:

❌ Skip for Trivial Changes

# Documentation updates
$ claude "Fix typo in README.md"
# No need to run tests

# Comment changes
$ claude "Add JSDoc comments to utils.ts"
# Type-check is enough

❌ Skip During Exploration

# Trying different approaches
$ claude "Try implementing this with recursion"
# Run verification manually when done exploring

❌ Skip for Read-Only Requests

# Questions about code
$ claude "Explain how the auth system works"
# No code changes, no verification needed

✅ Always Use for Production Code

# Feature development
$ claude "Add user registration"
✅ Use verification sandwich

# Bug fixes
$ claude "Fix race condition in payment processing"
✅ Use verification sandwich

# Refactoring
$ claude "Extract helper functions from AuthService"
✅ Use verification sandwich

Best Practices

1. Keep Verification Fast

Target: < 30 seconds total

If verification is slow, developers will skip it.

# ✓ Fast verification (20s)
npm run type-check  # 2s
npm run lint        # 3s
npm test            # 15s

# ✗ Slow verification (5 min)
npm run type-check  # 2s
npm run lint        # 3s
npm test            # 15s
npm run e2e-test    # 4min 40s  ← Too slow!

Solution: Move slow tests to CI, keep local verification fast.

2. Make Verification Obvious

Use clear output that shows exactly what passed/failed:

# ✓ Clear output
🔍 Running quality gates...
  ├─ Type checking... ✓
  ├─ Linting... ✓
  ├─ Testing... ✗
    FAILED: auth.test.ts:23
  └─ Building... (skipped)

# ✗ Unclear output
Running checks...
Error: Command failed

3. Fail Fast

Stop on first failure instead of running all checks:

# ✓ Fail fast (stops after type-check)
set -e
npm run type-check  # ✗ Fails
# npm run lint (skipped)
# npm test (skipped)

# ✗ Run all checks even after failure
npm run type-check || true  # ✗ Fails but continues
npm run lint                # Runs anyway
npm test                    # Runs anyway

4. Version Control Integration

Run pre-verification on checkout:

# .git/hooks/post-checkout
#!/bin/bash
echo "🔍 Verifying clean state after checkout..."
./scripts/verify.sh

This catches issues immediately after switching branches.

5. CI/CD Integration

Use the same verification script in CI:

# .github/workflows/ci.yml
name: CI
on: [push, pull_request]
jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
      - run: npm ci
      - run: ./scripts/verify.sh

This ensures local and CI verification are identical.

Common Pitfalls

Pitfall 1: Skipping Pre-Verification

# ✗ Skipping pre-verification
$ claude "Add feature X"
# ... generates code ...
$ npm test
FAILED: 3 tests

# Now you don't know if these 3 failures are new or old

Solution: Always run pre-verification, even if you “think” the state is clean.

Pitfall 2: Ignoring Pre-Verification Failures

# ✗ Continuing despite failures
$ ./scripts/verify.sh
❌ Tests failed: user.test.ts:45

$ claude "Add feature X anyway"  ← Bad!

Solution: Fix the baseline before making new changes.

Pitfall 3: Different Pre/Post Verification

# ✗ Different checks
# Pre-verification
npm run type-check

# Post-verification
npm test  ← Different gates!

Solution: Use identical verification for pre and post.

Pitfall 4: Non-Deterministic Tests

# ✗ Flaky tests
$ npm test
Passed (124/124)

$ npm test
Failed (1/124)  ← Different result!

Solution: Fix flaky tests first, or exclude them from verification.

Measuring Success

Key Metrics

  1. Baseline confidence: % of time pre-verification passes

    • Target: >95%
  2. Delta clarity: % of time post-verification failures are from new code

    • Target: 100% (guaranteed by the pattern)
  3. Debugging time: Time spent investigating failures

    • Target: 50% reduction (no more “was this already broken?”)
  4. False blame rate: % of failures blamed on LLM that were pre-existing

    • Target: 0% (eliminated by pre-verification)

Conclusion

The verification sandwich pattern is the simplest, highest-impact workflow improvement for AI-assisted development.

Core principle: Never generate code on top of a broken baseline.

Implementation:

  1. Run all quality gates before generation (pre-verification)
  2. Make the code change (generation)
  3. Run all quality gates after generation (post-verification)

Result: Instant clarity on what changed and what broke.

Key insight: Pre-verification failures are blockers, not warnings. Fix them first.

Related Concepts

References

Topics
Baseline TestingCi CdDelta DetectionPost VerificationPre VerificationQuality GatesTest DrivenTestingVerificationWorkflow Pattern

More Insights

Cover Image for Thought Leaders

Thought Leaders

People to follow for compound engineering, context engineering, and AI agent development.

James Phoenix
James Phoenix
Cover Image for Systems Thinking & Observability

Systems Thinking & Observability

Software should be treated as a measurable dynamical system, not as a collection of features.

James Phoenix
James Phoenix