Trust But Verify Protocol: AI-Generated Tests Over Manual Review

James Phoenix

Summary

Reviewing all AI-generated code manually is time-consuming and error-prone. Instead of reviewing 1000+ lines of generated code, ask the AI to write verification tests and review just the test output. This reduces review burden by 99%, catches bugs at generation time, and creates compound learning where each verification teaches the AI what ‘correct’ looks like.

The Problem

Reviewing all AI-generated code is time-consuming and error-prone. For a 1000-line feature, manually checking every function, edge case, and integration point takes hours and bugs still slip through. Traditional code review assumes human-written code with human reasoning – but AI-generated code lacks this context, making review even harder.

The Solution

Don’t trust AI output – ask AI to create verification instead. The pattern: AI writes code → AI writes verification (tests, scripts, visual checks) → You review verification output. This shifts focus from reviewing implementation details to validating behavior. Instead of reading 1000 lines of code, you check 10 lines of test output. Bugs caught immediately while context is fresh, and verification artifacts compound into a quality gate system.

The Problem

When working with AI coding agents, you face a fundamental challenge: how do you verify generated code is correct?

The naive approach is manual code review:

AI: "I've implemented user authentication with password hashing,
     session management, and rate limiting. Here are 847 lines of code."

You: *Starts reading line by line*
     - Is the password hash secure?
     - Are sessions properly invalidated?
     - Is rate limiting configured correctly?
     - Are edge cases handled?
     - Is error handling complete?
     - Are there race conditions?
     *3 hours later, eyes glazing over*

Why Manual Review Fails

1. Scale Problem

AI can generate code 10-100x faster than humans can review it:

AI generation: 1000 lines in 2 minutes
Human review: 1000 lines in 2-4 hours
Result: Review becomes the bottleneck

2. Context Loss

By the time you finish reviewing, you’ve forgotten earlier parts:

Line 1-200: "This authentication logic looks good..."
Line 400-600: "Wait, how does this relate to the session management?"
Line 800-1000: "I need to re-read the beginning to understand this..."

3. False Confidence

Code that looks correct often isn’t:

// Looks good at first glance...
async function createUser(email: string, password: string) {
  const hash = await bcrypt.hash(password, 10);
  const user = await db.users.create({ email, passwordHash: hash });
  return user;
}

// But missing:
// - Email validation
// - Duplicate email check
// - Password strength requirements
// - Input sanitization
// - Error handling
// - Transaction rollback

You think you’ve reviewed it thoroughly, but missed 6 critical issues.

4. Missed Edge Cases

Humans are bad at systematically checking edge cases:

// Did you verify:
// - Empty string inputs?
// - Null/undefined values?
// - Maximum length strings?
// - Special characters?
// - Unicode edge cases?
// - Concurrent requests?
// - Database connection failures?
// - Network timeouts?

Probably not. Too tedious.

5. No Regression Protection

Even if you catch all bugs during review, there’s no artifact preventing regression:

Today: You manually verify authentication works
1 week later: AI modifies authentication code
Result: Previous bugs can re-emerge, no automated check

The Cost

Time cost:

1000-line feature = 3 hours of review
5 features/week = 15 hours/week reviewing
37% of your time spent reading code

Quality cost:

Bugs slip through review (human error)
No systematic edge case coverage
No regression protection
False confidence in “reviewed” code

Productivity cost:

Review becomes bottleneck
AI sits idle waiting for approval
Iteration slows down
Development velocity tanks

The Solution

Don’t trust AI output – ask AI to create verification instead.

The Trust But Verify Pattern

Instead of:

1. AI writes code
2. You review everything
3. Bugs slip through

Do this:

1. AI writes code
2. AI writes verification (tests, scripts, visual checks)
3. AI runs verification
4. You review verification output (10 lines vs 1000 lines)
5. Fix any failures immediately while context is fresh

Why This Works

1. Verification is Easier to Review

Compare these review tasks:

Manual review:

// Review 847 lines of authentication code
// Mentally execute all edge cases
// Try to spot security vulnerabilities
// Guess at race conditions
// Wonder about error handling
*3 hours of intense concentration*

Verification review:

# Review test output
✅ User registration with valid data: PASSED
✅ Duplicate email rejection: PASSED
✅ Password strength validation: PASSED
✅ SQL injection prevention: PASSED
✅ Rate limiting (100 requests): PASSED
✅ Session expiration: PASSED
✅ Concurrent registration (race condition): PASSED
❌ Password reset token expiration: FAILED
   Expected: Token expires after 1 hour
   Actual: Token never expires

*30 seconds to spot the issue*

2. Verification is Systematic

Tests check every edge case, every time:

// AI generates comprehensive test suite
describe('User Authentication', () => {
  it('accepts valid email formats', () => { ... });
  it('rejects invalid email formats', () => { ... });
  it('requires password >= 8 characters', () => { ... });
  it('requires password with uppercase', () => { ... });
  it('requires password with number', () => { ... });
  it('requires password with special char', () => { ... });
  it('prevents SQL injection in email', () => { ... });
  it('prevents SQL injection in password', () => { ... });
  it('rate limits registration attempts', () => { ... });
  it('handles database connection errors', () => { ... });
  // ... 50+ more tests
});

Human review would skip most of these. Tests check them all, every time.

3. Verification Creates Artifacts

Tests become permanent quality gates:

Day 1: AI writes auth code + tests
       Tests pass ✅
       
Day 7: AI modifies auth code
       Tests catch regression ❌
       AI fixes issue
       Tests pass ✅
       
Day 30: AI refactors auth code
        Tests ensure behavior unchanged ✅

4. Compound Learning

Each verification teaches the AI what “correct” looks like:

Iteration 1:
  Code: Missing rate limiting
  Verification: Test fails
  Learning: "Rate limiting is required"
  
Iteration 2:
  Code: Includes rate limiting
  Verification: Test passes
  Learning: "This is the correct pattern"
  
Iteration 3+:
  Code: Automatically includes rate limiting
  Verification: Confirms it's correct

The AI gets better with each verification cycle.

Implementation

Essential Verification Patterns

Pattern 1: Runtime Verification

Use case: Verify code actually works as intended

Example prompt:

"Implement user authentication API endpoint.

After implementation, create a verification script that:
1. Starts the server
2. Tests all endpoints with valid/invalid data
3. Checks response codes and data
4. Verifies database state
5. Cleans up test data
6. Reports all results

Run the verification script and show me the output."

AI generates:

// verify-auth.ts
import { startServer, stopServer } from './test-server';
import { resetDatabase } from './test-utils';

async function verifyAuthentication() {
  console.log('🔍 Starting authentication verification...
');
  
  await startServer();
  await resetDatabase();
  
  const results = [];
  
  // Test 1: User registration
  try {
    const res = await fetch('http://localhost:3000/api/auth/register', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        email: '[email protected]',
        password: 'SecurePass123!'
      })
    });
    
    if (res.status === 201) {
      const user = await res.json();
      if (user.id && user.email === '[email protected]') {
        results.push({ test: 'User registration', status: 'PASS' });
      } else {
        results.push({ test: 'User registration', status: 'FAIL', reason: 'Invalid response data' });
      }
    } else {
      results.push({ test: 'User registration', status: 'FAIL', reason: `Expected 201, got ${res.status}` });
    }
  } catch (error) {
    results.push({ test: 'User registration', status: 'FAIL', reason: error.message });
  }
  
  // Test 2: Duplicate email rejection
  try {
    const res = await fetch('http://localhost:3000/api/auth/register', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        email: '[email protected]',
        password: 'SecurePass123!'
      })
    });
    
    if (res.status === 409) {
      results.push({ test: 'Duplicate email rejection', status: 'PASS' });
    } else {
      results.push({ test: 'Duplicate email rejection', status: 'FAIL', reason: `Expected 409, got ${res.status}` });
    }
  } catch (error) {
    results.push({ test: 'Duplicate email rejection', status: 'FAIL', reason: error.message });
  }
  
  // Test 3: Password strength validation
  // Test 4: SQL injection prevention
  // Test 5: Rate limiting
  // ... more tests
  
  await stopServer();
  
  // Print results
  console.log('
📊 Verification Results:
');
  results.forEach(({ test, status, reason }) => {
    const icon = status === 'PASS' ? '✅' : '❌';
    console.log(`${icon} ${test}: ${status}`);
    if (reason) console.log(`   Reason: ${reason}`);
  });
  
  const passCount = results.filter(r => r.status === 'PASS').length;
  const failCount = results.filter(r => r.status === 'FAIL').length;
  
  console.log(`
${passCount}/${results.length} tests passed`);
  
  if (failCount > 0) {
    process.exit(1);
  }
}

verifyAuthentication();

Your review: Scan output for failures, not code

Pattern 2: Visual Verification

Use case: Verify UI components render correctly

Example prompt:

"Implement a user profile page with:
- Avatar upload
- Name and bio editing
- Email change with verification
- Password reset

After implementation, create a Playwright script that:
1. Takes screenshots of all UI states (empty, filled, error states)
2. Tests all interactions (upload, edit, save, cancel)
3. Verifies form validation
4. Saves screenshots to ./screenshots/profile/

Run the script and show me the screenshots."

AI generates:

// verify-profile-ui.ts
import { chromium } from 'playwright';
import { startServer, stopServer } from './test-server';
import fs from 'fs';

async function verifyProfileUI() {
  await startServer();
  
  const browser = await chromium.launch();
  const page = await browser.newPage();
  
  // Ensure screenshot directory exists
  fs.mkdirSync('./screenshots/profile', { recursive: true });
  
  // Test 1: Empty state
  await page.goto('http://localhost:3000/profile');
  await page.screenshot({ path: './screenshots/profile/01-empty-state.png' });
  
  // Test 2: Filled state
  await page.fill('[name="name"]', 'John Doe');
  await page.fill('[name="bio"]', 'Software engineer passionate about AI');
  await page.screenshot({ path: './screenshots/profile/02-filled-state.png' });
  
  // Test 3: Validation errors
  await page.fill('[name="email"]', 'invalid-email');
  await page.click('button[type="submit"]');
  await page.screenshot({ path: './screenshots/profile/03-validation-errors.png' });
  
  // Test 4: Avatar upload
  await page.setInputFiles('[type="file"]', './test-fixtures/avatar.png');
  await page.screenshot({ path: './screenshots/profile/04-avatar-uploaded.png' });
  
  // Test 5: Success state
  await page.fill('[name="email"]', '[email protected]');
  await page.click('button[type="submit"]');
  await page.waitForSelector('.success-message');
  await page.screenshot({ path: './screenshots/profile/05-success-state.png' });
  
  await browser.close();
  await stopServer();
  
  console.log('✅ Screenshots saved to ./screenshots/profile/');
  console.log('📸 Review the following states:');
  console.log('  1. Empty state');
  console.log('  2. Filled state');
  console.log('  3. Validation errors');
  console.log('  4. Avatar uploaded');
  console.log('  5. Success state');
}

verifyProfileUI();

Your review: Look at 5 screenshots instead of reading 500 lines of React code

Pattern 3: Data Verification

Use case: Verify data migrations, transformations, or bulk operations

Example prompt:

"Create a script that migrates user data from old schema to new schema.

After implementation, create a verification script that:
1. Counts records before migration
2. Runs migration
3. Counts records after migration
4. Validates data integrity (no data loss)
5. Checks for duplicate records
6. Verifies all foreign keys are valid
7. Generates a detailed report

Run the verification and show me the report."

AI generates:

// verify-migration.ts
import { db } from './database';

async function verifyMigration() {
  console.log('🔍 Verifying data migration...
');
  
  // Pre-migration counts
  const beforeCounts = {
    users: await db.old_users.count(),
    profiles: await db.old_profiles.count(),
    posts: await db.old_posts.count(),
  };
  
  console.log('📊 Before migration:');
  console.log(`   Users: ${beforeCounts.users}`);
  console.log(`   Profiles: ${beforeCounts.profiles}`);
  console.log(`   Posts: ${beforeCounts.posts}
`);
  
  // Run migration
  console.log('🚀 Running migration...');
  await runMigration();
  console.log('✅ Migration complete
');
  
  // Post-migration counts
  const afterCounts = {
    users: await db.users.count(),
    profiles: await db.profiles.count(),
    posts: await db.posts.count(),
  };
  
  console.log('📊 After migration:');
  console.log(`   Users: ${afterCounts.users}`);
  console.log(`   Profiles: ${afterCounts.profiles}`);
  console.log(`   Posts: ${afterCounts.posts}
`);
  
  // Verification checks
  const checks = [];
  
  // Check 1: No data loss
  checks.push({
    name: 'No user data loss',
    passed: beforeCounts.users === afterCounts.users,
    expected: beforeCounts.users,
    actual: afterCounts.users,
  });
  
  // Check 2: No duplicates
  const duplicateEmails = await db.users
    .groupBy('email')
    .having('COUNT(*) > 1')
    .select('email', 'COUNT(*)');
  
  checks.push({
    name: 'No duplicate emails',
    passed: duplicateEmails.length === 0,
    expected: 0,
    actual: duplicateEmails.length,
  });
  
  // Check 3: Foreign key integrity
  const orphanedProfiles = await db.profiles
    .leftJoin('users', 'profiles.user_id', 'users.id')
    .whereNull('users.id')
    .count();
  
  checks.push({
    name: 'All profiles have valid user_id',
    passed: orphanedProfiles === 0,
    expected: 0,
    actual: orphanedProfiles,
  });
  
  // Print results
  console.log('🔍 Verification Checks:
');
  checks.forEach(check => {
    const icon = check.passed ? '✅' : '❌';
    console.log(`${icon} ${check.name}`);
    if (!check.passed) {
      console.log(`   Expected: ${check.expected}`);
      console.log(`   Actual: ${check.actual}`);
    }
  });
  
  const allPassed = checks.every(c => c.passed);
  console.log(`
${allPassed ? '✅ All checks passed' : '❌ Some checks failed'}`);
  
  if (!allPassed) process.exit(1);
}

verifyMigration();

Your review: Read the summary report, not database queries

Pattern 4: API Verification

Use case: Verify API endpoints work correctly

Example prompt:

"Implement a REST API for task management with CRUD operations.

After implementation, create a comprehensive API test suite that:
1. Tests all endpoints (GET, POST, PUT, DELETE)
2. Tests with valid and invalid data
3. Tests authentication and authorization
4. Tests rate limiting
5. Tests error handling
6. Generates a test coverage report

Run the tests and show me the results."

AI generates:

// verify-tasks-api.test.ts
import { describe, it, expect, beforeAll, afterAll, beforeEach } from 'vitest';
import { startServer, stopServer } from './test-server';
import { resetDatabase, createTestUser } from './test-utils';

describe('Tasks API Verification', () => {
  let authToken: string;
  let userId: string;
  
  beforeAll(async () => {
    await startServer();
  });
  
  afterAll(async () => {
    await stopServer();
  });
  
  beforeEach(async () => {
    await resetDatabase();
    const { token, id } = await createTestUser();
    authToken = token;
    userId = id;
  });
  
  describe('POST /api/tasks', () => {
    it('creates task with valid data', async () => {
      const res = await fetch('http://localhost:3000/api/tasks', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Authorization': `Bearer ${authToken}`,
        },
        body: JSON.stringify({
          title: 'Test task',
          description: 'Task description',
        }),
      });
      
      expect(res.status).toBe(201);
      const task = await res.json();
      expect(task).toMatchObject({
        title: 'Test task',
        description: 'Task description',
        userId,
      });
    });
    
    it('rejects missing title', async () => {
      const res = await fetch('http://localhost:3000/api/tasks', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Authorization': `Bearer ${authToken}`,
        },
        body: JSON.stringify({
          description: 'Task description',
        }),
      });
      
      expect(res.status).toBe(400);
    });
    
    it('requires authentication', async () => {
      const res = await fetch('http://localhost:3000/api/tasks', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ title: 'Test' }),
      });
      
      expect(res.status).toBe(401);
    });
  });
  
  describe('GET /api/tasks', () => {
    it('returns user tasks only', async () => {
      // Create tasks for this user
      await createTask(authToken, { title: 'Task 1' });
      await createTask(authToken, { title: 'Task 2' });
      
      // Create task for different user
      const { token: otherToken } = await createTestUser();
      await createTask(otherToken, { title: 'Other task' });
      
      const res = await fetch('http://localhost:3000/api/tasks', {
        headers: { 'Authorization': `Bearer ${authToken}` },
      });
      
      expect(res.status).toBe(200);
      const tasks = await res.json();
      expect(tasks).toHaveLength(2);
      expect(tasks.every(t => t.userId === userId)).toBe(true);
    });
  });
  
  // More tests for PUT, DELETE, etc...
});

Your review: Check test output, not API implementation code

Step-by-Step Workflow

Step 1: Request Implementation + Verification

Instead of:

"Implement user authentication"

Ask for:

"Implement user authentication.

After implementation, create a comprehensive verification suite that tests:
1. User registration (valid/invalid data)
2. Login (correct/incorrect credentials)
3. Password hashing (never stored plain text)
4. Session management (creation, validation, expiration)
5. Rate limiting (prevent brute force)
6. Security (SQL injection, XSS prevention)

Run the verification suite and show me the results."

Step 2: Review Verification Output

AI runs tests and shows:

✅ User registration with valid email: PASSED
✅ User registration rejects invalid email: PASSED
✅ User registration requires password >= 8 chars: PASSED
❌ Duplicate email handling: FAILED
   Expected: 409 Conflict
   Actual: 500 Internal Server Error
✅ Login with correct credentials: PASSED
❌ Login rate limiting after 5 attempts: FAILED
   Expected: 429 Too Many Requests after 5 attempts
   Actual: No rate limiting detected
✅ Password hashing verification: PASSED
✅ Session expiration after 24h: PASSED
✅ SQL injection prevention in email field: PASSED

6/9 tests passed, 3 failed

Your action: Scan for failures (takes 10 seconds)

Step 3: Request Fixes

"Fix the 3 failing tests:
1. Duplicate email should return 409, not 500
2. Implement rate limiting (5 attempts per 15 minutes)
3. Re-run verification after fixes"

Step 4: Verify Fixes

AI shows:

✅ All 9 tests passed

✅ User registration with valid email: PASSED
✅ User registration rejects invalid email: PASSED
✅ User registration requires password >= 8 chars: PASSED
✅ Duplicate email handling: PASSED  (FIXED)
✅ Login with correct credentials: PASSED
✅ Login rate limiting after 5 attempts: PASSED  (FIXED)
✅ Password hashing verification: PASSED
✅ Session expiration after 24h: PASSED
✅ SQL injection prevention in email field: PASSED

Your action: Confirm all tests pass (5 seconds)

Total review time: 15 seconds instead of 3 hours

Benefits

1. Reduced Review Burden

Before:

Review 1000 lines of code
Mentally execute edge cases
Try to spot bugs visually
Time: 2-4 hours

After:

Scan test output (10 lines)
See which tests pass/fail
Focus on failures only
Time: 30 seconds

Reduction: 99% less time

2. Higher Quality

Before:

Human review misses edge cases
No systematic coverage
Bugs slip through
Bug detection: 40-60%

After:

Automated tests check every case
Systematic coverage
Bugs caught immediately
Bug detection: 80-95%

Improvement: 2x better bug detection

3. Faster Iteration

Before:

Generate code (5 min) → Wait for review (hours/days) → Fix issues → Wait again
Cycle time: Days

After:

Generate code (5 min) → Generate verification (2 min) → Review output (30 sec) → Fix (5 min)
Cycle time: 15 minutes

Improvement: 100x faster iteration

4. Compound Learning

Verification creates a feedback loop:

Iteration 1: Generate code → Tests fail → Fix → Tests pass
             Learning: "This is what correct looks like"

Iteration 2: Generate code → Tests pass first time
             Learning: "I remember the correct pattern"

Iteration 3+: Generate increasingly correct code on first attempt
              Learning compounds over time

Result: AI gets better with each verification cycle

5. Regression Protection

Tests become permanent quality gates:

Day 1: Feature + tests created
Day 7: Refactoring doesn't break tests ✅
Day 30: New feature doesn't break existing tests ✅
Day 90: Still protected by original tests ✅

Best Practices

1. Always Request Verification

Make it a habit:

❌ "Implement feature X"

✅ "Implement feature X.
    After implementation, create verification that tests Y and Z.
    Run verification and show results."

2. Specify Verification Criteria

Be explicit about what to verify:

"Create a password reset flow.

Verification must test:
- ✅ Email validation
- ✅ Token generation and expiration (1 hour)
- ✅ Token can only be used once
- ✅ Password strength requirements
- ✅ Old password is actually changed
- ✅ User can login with new password
- ✅ Old password no longer works
- ✅ Rate limiting on reset requests
- ✅ Email sending (mock or real)

Run verification and show results."

3. Request Multiple Verification Types

Combine different verification patterns:

"Implement checkout flow.

Create verification suite with:
1. Integration tests (API endpoints)
2. Playwright tests (UI flow)
3. Data verification (order created in database)
4. Email verification (confirmation sent)

Run all verifications and show results."

4. Use Verification Output as Documentation

Test output documents behavior:

# This output IS the documentation
✅ Cart - Add item increases quantity
✅ Cart - Remove item decreases quantity
✅ Cart - Empty cart shows empty state
✅ Checkout - Validates credit card format
✅ Checkout - Calculates tax based on shipping address
✅ Checkout - Sends confirmation email
✅ Checkout - Creates order in database
✅ Checkout - Clears cart after successful order

Anyone reading this understands what the system does.

5. Keep Verification Scripts

Don’t throw away verification scripts:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

project/
├── src/
│   └── features/
│       ├── auth/
│       │   ├── auth.service.ts
│       │   └── verify-auth.ts        ← Keep this
│       ├── checkout/
│       │   ├── checkout.service.ts
│       │   └── verify-checkout.ts    ← Keep this
│       └── user/
│           ├── user.service.ts
│           └── verify-user.ts        ← Keep this

Run them in CI/CD for continuous verification.

6. Review Verification Code Too

Occasionally review the verification code itself:

"Show me the verification code for the authentication tests.

I want to ensure:
1. All edge cases are covered
2. Tests are actually testing the right things
3. No false positives
"

But this is much faster than reviewing implementation code.

Common Pitfalls

❌ Pitfall 1: Trusting Verification Without Running It

Problem: Assuming verification works without actually running it

AI: "I've created verification tests."
You: "Great!" *Ships to production*
Production: *Everything breaks*

Solution: Always require AI to run verification and show output

"Create verification tests, RUN THEM, and show me the output."

❌ Pitfall 2: Accepting Partial Verification

Problem: Only verifying happy path

// Incomplete verification
it('creates user', async () => {
  const user = await createUser('[email protected]', 'password');
  expect(user.email).toBe('[email protected]');
});

// Missing:
// - Duplicate email test
// - Invalid email test
// - Weak password test
// - SQL injection test
// - Rate limiting test

Solution: Explicitly request edge case coverage

"Verification must test:
- Happy path
- All error cases
- Edge cases (empty, null, max length)
- Security (injection, XSS)
- Performance (rate limiting, timeouts)"

❌ Pitfall 3: Not Fixing Failures Immediately

Problem: Seeing failures but fixing “later”

❌ 3 tests failed
You: "I'll fix those later"
*Context lost, takes 10x longer to fix*

Solution: Fix failures immediately while context is fresh

"3 tests failed. Fix them now and re-run verification."

❌ Pitfall 4: Over-Relying on Unit Tests

Problem: Generating unit tests with mocks instead of integration tests

// Low-value verification
const mockDb = { create: jest.fn() };
await service.createUser('[email protected]');
expect(mockDb.create).toHaveBeenCalled(); // Meaningless

Solution: Prefer integration tests that verify real behavior

"Create INTEGRATION tests that verify the actual API endpoints
with a real test database, not mocked dependencies."

Integration with Other Patterns

Combine with Integration Tests

Trust But Verify works best with integration tests:

"Implement payment processing.

Create integration tests that:
1. Start test server with test database
2. Test complete payment flows
3. Verify database state after each operation
4. Test with real Stripe test mode

Run tests and show results."

Combine with Claude Code Hooks

Automate verification in hooks:

# .claudehooks/post-write
#!/bin/bash

# Run verification after any code change
if <a href="/posts/claude-files-changed-auth-service-ts/">$CLAUDE_FILES_CHANGED == *"auth.service.ts"*</a>; then
  echo "Running auth verification..."
  npm run verify:auth
fi

Combine with Evaluation Driven Development

Use verification as your evaluation:

"Implement feature X.

Evaluation criteria (must pass):
1. All integration tests pass
2. All Playwright tests pass
3. All security tests pass
4. All performance tests pass

Only mark complete when ALL evaluations pass."

Measuring Success

Key Metrics

1. Review Time Reduction

Before: 3 hours reviewing 1000 lines of code
After: 30 seconds reviewing test output
Reduction: 99.7%

2. Bug Detection Rate

Before: Manual review catches 40-60% of bugs
After: Automated verification catches 80-95% of bugs
Improvement: 2x better

3. Iteration Speed

Before: 1-2 iterations per day (waiting for review)
After: 10-20 iterations per day (immediate verification)
Improvement: 10x faster

4. Regression Rate

Before: 20-30% of bugs are regressions
After: <5% regressions (tests prevent them)
Improvement: 6x fewer regressions

Tracking Dashboard

interface VerificationMetrics {
  totalVerifications: number;
  passRate: number;
  avgFixTime: number; // minutes
  bugsPreventedByVerification: number;
  reviewTimeSaved: number; // hours
}

const metrics: VerificationMetrics = {
  totalVerifications: 347,
  passRate: 0.73, // 73% pass first time
  avgFixTime: 8, // 8 minutes to fix failures
  bugsPreventedByVerification: 234,
  reviewTimeSaved: 520, // hours
};

Conclusion

The Trust But Verify Protocol shifts your role from code reviewer to quality validator:

Old approach:

Read every line of generated code
Mentally execute edge cases
Try to spot bugs visually
Result: Slow, error-prone, tedious

Trust But Verify:

AI generates code + verification
Scan verification output
Fix failures immediately
Result: Fast, systematic, effective

The pattern:

1. AI writes code
2. AI writes verification (tests, scripts, checks)
3. AI runs verification
4. You review output (not code)
5. Fix failures while context is fresh
6. Verification becomes permanent quality gate

The benefits:

✅ 99% reduction in review time
✅ 2x better bug detection
✅ 10x faster iteration
✅ Compound learning (AI improves over time)
✅ Regression protection (tests prevent backsliding)

The mindset shift:

From: "I need to review this code to ensure it's correct"
To: "I need to see evidence this code works correctly"

Don’t trust AI output. But don’t manually review everything either.

Trust, but verify. Through automation.

Related Concepts

Integration Tests Over Unit Tests: Prefer integration tests for higher-signal verification
Evaluation Driven Development: Use verification as evaluation criteria
Test-Based Regression Patching: Write tests that make bugs illegal
Claude Code Hooks as Quality Gates: Automate verification in development hooks
Playwright Script Loop: Generate visual verification scripts for UI testing

Related Concepts

integration-testing-patterns
evaluation-driven-development
test-based-regression-patching
claude-code-hooks-quality-gates
quality-gates-as-information-filters
stateless-verification-loops
llm-recursive-function-model
verification-sandwich-pattern
YOLO Mode Configuration – Eliminate permission prompts by trusting quality gates instead of manual approval

References

Playwright Documentation – Browser automation for visual verification testing
Vitest Documentation – Fast test framework for verification scripts

Trust But Verify Protocol: AI-Generated Tests Over Manual Review

Summary

The Problem

The Solution

The Problem

Why Manual Review Fails

The Cost

The Solution

The Trust But Verify Pattern

Why This Works

Implementation

Essential Verification Patterns

Pattern 1: Runtime Verification

Pattern 2: Visual Verification

Pattern 3: Data Verification

Pattern 4: API Verification

Step-by-Step Workflow

Benefits

1. Reduced Review Burden

2. Higher Quality

3. Faster Iteration

4. Compound Learning

5. Regression Protection

Best Practices

1. Always Request Verification

2. Specify Verification Criteria

3. Request Multiple Verification Types

4. Use Verification Output as Documentation

5. Keep Verification Scripts

Learn Prompt Engineering

6. Review Verification Code Too

Common Pitfalls

❌ Pitfall 1: Trusting Verification Without Running It

❌ Pitfall 2: Accepting Partial Verification

❌ Pitfall 3: Not Fixing Failures Immediately

❌ Pitfall 4: Over-Relying on Unit Tests

Integration with Other Patterns

Combine with Integration Tests

Combine with Claude Code Hooks

Combine with Evaluation Driven Development

Measuring Success

Key Metrics

Tracking Dashboard

Conclusion

Related Concepts

Related Concepts

References

More Insights

LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

Agent Search Observation Loop: Learning What Context to Provide