The Verification Ladder

James Phoenix

Types → Schema → Unit tests → Property tests → Formal verification. Each rung catches what the lower rungs miss.

The Ladder

┌─────────────────────────────────────────────────────────────┐
│  LEVEL 6: Formal Verification (TLA+, Alloy, Z3)            │
│  "Prove it's impossible to violate"                         │
├─────────────────────────────────────────────────────────────┤
│  LEVEL 5: Property-Based Testing (Hypothesis, fast-check)  │
│  "Test with thousands of generated inputs"                  │
├─────────────────────────────────────────────────────────────┤
│  LEVEL 4: Integration Tests                                 │
│  "Test components working together"                         │
├─────────────────────────────────────────────────────────────┤
│  LEVEL 3: Unit Tests                                        │
│  "Test individual functions"                                │
├─────────────────────────────────────────────────────────────┤
│  LEVEL 2: Runtime Validation (Zod, io-ts)                  │
│  "Validate data at boundaries"                              │
├─────────────────────────────────────────────────────────────┤
│  LEVEL 1: Static Types (TypeScript, mypy)                  │
│  "Catch errors at compile time"                             │
└─────────────────────────────────────────────────────────────┘

              ▲ Confidence
              │
              │ Each level catches what lower levels miss
              │

Level 1: Static Types

What it catches: Shape mismatches, null errors, wrong argument types

// Types catch this at compile time
function processUser(user: User): void {
  console.log(user.name.toUpperCase());
}

processUser({ id: 1 }); // ✗ Error: Property 'name' is missing
processUser(null);       // ✗ Error: Argument cannot be null

Limitations: Can’t enforce runtime values, business logic, or dynamic constraints.

Level 2: Runtime Validation (Zod/Schema)

What it catches: Invalid data from external sources, malformed inputs

import { z } from 'zod';

const UserSchema = z.object({
  id: z.string().uuid(),
  email: z.string().email(),
  age: z.number().int().min(0).max(150),
  role: z.enum(['admin', 'user', 'guest']),
});

// Runtime validation at API boundary
app.post('/users', (req, res) => {
  const result = UserSchema.safeParse(req.body);
  if (!result.success) {
    return res.status(400).json({ errors: result.error.issues });
  }
  // result.data is now typed and validated
});

Limitations: Only validates data shape, not behavioral properties.

Level 3: Unit Tests

What it catches: Logic errors in individual functions

describe('calculateDiscount', () => {
  it('applies 10% discount for orders over $100', () => {
    expect(calculateDiscount(150)).toBe(15);
  });

  it('applies no discount for orders under $100', () => {
    expect(calculateDiscount(50)).toBe(0);
  });

  it('handles edge case at exactly $100', () => {
    expect(calculateDiscount(100)).toBe(10);
  });
});

Limitations: Only tests the cases you think of. Misses edge cases you didn’t anticipate.

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

Level 4: Integration Tests

What it catches: Component interaction bugs, configuration errors

describe('User Registration Flow', () => {
  it('creates user and sends welcome email', async () => {
    const response = await request(app)
      .post('/api/register')
      .send({ email: '[email protected]', password: 'secure123' });

    expect(response.status).toBe(201);
    expect(await db.users.findByEmail('[email protected]')).toBeTruthy();
    expect(emailService.sent).toContainEqual(
      expect.objectContaining({ to: '[email protected]', template: 'welcome' })
    );
  });
});

Limitations: Still example-based. Can’t test all possible states.

Level 5: Property-Based Testing

What it catches: Edge cases you didn’t think of, invariant violations

from hypothesis import given, strategies as st

@given(
    items=st.lists(st.integers()),
)
def test_sort_properties(items):
    """Test invariants that must ALWAYS hold"""
    result = my_sort(items)

    # Property 1: Output has same length as input
    assert len(result) == len(items)

    # Property 2: Output contains same elements
    assert sorted(result) == sorted(items)

    # Property 3: Output is actually sorted
    assert all(result[i] <= result[i+1] for i in range(len(result)-1))

    # Property 4: Idempotent
    assert my_sort(result) == result

// TypeScript with fast-check
import fc from 'fast-check';

test('serialization roundtrip', () => {
  fc.assert(
    fc.property(fc.anything(), (value) => {
      const serialized = serialize(value);
      const deserialized = deserialize(serialized);
      expect(deserialized).toEqual(value);
    })
  );
});

What it generates: Thousands of test cases including:

Empty inputs
Very large inputs
Unicode edge cases
Boundary values
Null/undefined
Deeply nested structures

Limitations: Can’t prove absence of bugs, only find them probabilistically.

Level 6: Formal Verification

What it catches: Proves properties hold for ALL possible inputs

TLA+ Example (Distributed Systems)

---- MODULE RateLimiter ----
VARIABLES requests, window_start, count

TypeInvariant ==
  /\ requests \in Nat
  /\ count \in 0..MAX_REQUESTS
  /\ window_start \in Nat

SafetyInvariant ==
  count <= MAX_REQUESTS  \* NEVER exceeded

Init ==
  /\ requests = 0
  /\ window_start = 0
  /\ count = 0

AllowRequest ==
  /\ count < MAX_REQUESTS
  /\ count' = count + 1
  /\ requests' = requests + 1
  /\ UNCHANGED window_start

RejectRequest ==
  /\ count >= MAX_REQUESTS
  /\ UNCHANGED <<requests, window_start, count>>

WindowReset ==
  /\ window_start' = window_start + WINDOW_SIZE
  /\ count' = 0
  /\ UNCHANGED requests

Next == AllowRequest \/ RejectRequest \/ WindowReset

Spec == Init /\ [][Next]_<<requests, window_start, count>>

THEOREM Spec => []SafetyInvariant
====

Z3 Example (Constraint Solving)

from z3 import *

def verify_no_overflow(max_value: int):
    """Prove that our calculation never overflows"""
    x = Int('x')
    y = Int('y')

    solver = Solver()

    # Preconditions
    solver.add(x >= 0, x <= max_value)
    solver.add(y >= 0, y <= max_value)

    # Try to find a case where result overflows
    result = x + y
    solver.add(result > 2**31 - 1)  # 32-bit overflow

    if solver.check() == sat:
        model = solver.model()
        print(f"OVERFLOW POSSIBLE: x={model[x]}, y={model[y]}")
        return False
    else:
        print("PROVEN: No overflow possible")
        return True

Limitations: Expensive to write, limited to specific properties, requires expertise.

Choosing the Right Level

Scenario	Minimum Level
Internal utility function	Level 3 (Unit tests)
API endpoint	Level 2 (Schema) + Level 4 (Integration)
Financial calculations	Level 5 (Property tests)
Security-critical code	Level 5 + manual audit
Distributed consensus	Level 6 (Formal verification)
Life-critical systems	Level 6 (Formal verification)

The Quality Gate Stack

Layer verification for maximum coverage:

# .github/workflows/verify.yml
jobs:
  level-1-types:
    run: tsc --noEmit

  level-2-schema:
    run: npm run validate-schemas

  level-3-unit:
    run: npm test -- --coverage

  level-4-integration:
    run: npm run test:integration

  level-5-property:
    run: npm run test:property

  level-6-formal:
    run: tlc RateLimiter.tla  # If applicable

Cost vs Confidence

        ▲ Confidence
        │
   100% │                              ████ Formal
        │                         ████
    95% │                    ████ Property
        │               ████
    80% │          ████ Integration
        │     ████
    60% │████ Unit + Schema + Types
        │
        └────────────────────────────────────▶ Cost/Effort

The insight: You get 80% confidence from levels 1-3 at low cost. Levels 5-6 give the last 20% but cost 5x more.

Choose based on risk tolerance.

For Agent-Driven Development

Tell agents which level to verify at:

# .claude/commands/implement-with-verification.md
Implement the feature with verification at these levels:

1. Types: Full TypeScript strict mode
2. Schema: Zod validation on all inputs
3. Unit tests: Cover happy path + error cases
4. Property tests: Test these invariants:
   - [List invariants here]

Run all verification before marking complete.

Key Insight

Each level catches bugs that lower levels miss. The question isn’t “which level?” but “how high do we need to go for this risk?”

Constraint-First Development – Expressing constraints
Property-Based Testing – Level 5 deep dive
Closed-Loop Optimization – Runtime verification
Highest Leverage – Validation as leverage
Verification Sandwich Pattern – Establish baseline before and after code generation
Integration Testing Patterns – Level 4 implementation strategies
Test-Driven Prompting – Write tests before generating code
Stateless Verification Loops – Prevent state accumulation in verification