The Verification Ladder

James Phoenix
James Phoenix

Types → Schema → Unit tests → Property tests → Formal verification. Each rung catches what the lower rungs miss.


The Ladder

┌─────────────────────────────────────────────────────────────┐
│  LEVEL 6: Formal Verification (TLA+, Alloy, Z3)            │
│  "Prove it's impossible to violate"                         │
├─────────────────────────────────────────────────────────────┤
│  LEVEL 5: Property-Based Testing (Hypothesis, fast-check)  │
│  "Test with thousands of generated inputs"                  │
├─────────────────────────────────────────────────────────────┤
│  LEVEL 4: Integration Tests                                 │
│  "Test components working together"                         │
├─────────────────────────────────────────────────────────────┤
│  LEVEL 3: Unit Tests                                        │
│  "Test individual functions"                                │
├─────────────────────────────────────────────────────────────┤
│  LEVEL 2: Runtime Validation (Zod, io-ts)                  │
│  "Validate data at boundaries"                              │
├─────────────────────────────────────────────────────────────┤
│  LEVEL 1: Static Types (TypeScript, mypy)                  │
│  "Catch errors at compile time"                             │
└─────────────────────────────────────────────────────────────┘

              ▲ Confidence
              │
              │ Each level catches what lower levels miss
              │

Level 1: Static Types

What it catches: Shape mismatches, null errors, wrong argument types

// Types catch this at compile time
function processUser(user: User): void {
  console.log(user.name.toUpperCase());
}

processUser({ id: 1 }); // ✗ Error: Property 'name' is missing
processUser(null);       // ✗ Error: Argument cannot be null

Limitations: Can’t enforce runtime values, business logic, or dynamic constraints.


Level 2: Runtime Validation (Zod/Schema)

What it catches: Invalid data from external sources, malformed inputs

import { z } from 'zod';

const UserSchema = z.object({
  id: z.string().uuid(),
  email: z.string().email(),
  age: z.number().int().min(0).max(150),
  role: z.enum(['admin', 'user', 'guest']),
});

// Runtime validation at API boundary
app.post('/users', (req, res) => {
  const result = UserSchema.safeParse(req.body);
  if (!result.success) {
    return res.status(400).json({ errors: result.error.issues });
  }
  // result.data is now typed and validated
});

Limitations: Only validates data shape, not behavioral properties.


Level 3: Unit Tests

What it catches: Logic errors in individual functions

describe('calculateDiscount', () => {
  it('applies 10% discount for orders over $100', () => {
    expect(calculateDiscount(150)).toBe(15);
  });

  it('applies no discount for orders under $100', () => {
    expect(calculateDiscount(50)).toBe(0);
  });

  it('handles edge case at exactly $100', () => {
    expect(calculateDiscount(100)).toBe(10);
  });
});

Limitations: Only tests the cases you think of. Misses edge cases you didn’t anticipate.


Level 4: Integration Tests

What it catches: Component interaction bugs, configuration errors

describe('User Registration Flow', () => {
  it('creates user and sends welcome email', async () => {
    const response = await request(app)
      .post('/api/register')
      .send({ email: '[email protected]', password: 'secure123' });

    expect(response.status).toBe(201);
    expect(await db.users.findByEmail('[email protected]')).toBeTruthy();
    expect(emailService.sent).toContainEqual(
      expect.objectContaining({ to: '[email protected]', template: 'welcome' })
    );
  });
});

Limitations: Still example-based. Can’t test all possible states.

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course

Level 5: Property-Based Testing

What it catches: Edge cases you didn’t think of, invariant violations

from hypothesis import given, strategies as st

@given(
    items=st.lists(st.integers()),
)
def test_sort_properties(items):
    """Test invariants that must ALWAYS hold"""
    result = my_sort(items)

    # Property 1: Output has same length as input
    assert len(result) == len(items)

    # Property 2: Output contains same elements
    assert sorted(result) == sorted(items)

    # Property 3: Output is actually sorted
    assert all(result[i] <= result[i+1] for i in range(len(result)-1))

    # Property 4: Idempotent
    assert my_sort(result) == result
// TypeScript with fast-check
import fc from 'fast-check';

test('serialization roundtrip', () => {
  fc.assert(
    fc.property(fc.anything(), (value) => {
      const serialized = serialize(value);
      const deserialized = deserialize(serialized);
      expect(deserialized).toEqual(value);
    })
  );
});

What it generates: Thousands of test cases including:

  • Empty inputs
  • Very large inputs
  • Unicode edge cases
  • Boundary values
  • Null/undefined
  • Deeply nested structures

Limitations: Can’t prove absence of bugs, only find them probabilistically.


Level 6: Formal Verification

What it catches: Proves properties hold for ALL possible inputs

TLA+ Example (Distributed Systems)

---- MODULE RateLimiter ----
VARIABLES requests, window_start, count

TypeInvariant ==
  /\ requests \in Nat
  /\ count \in 0..MAX_REQUESTS
  /\ window_start \in Nat

SafetyInvariant ==
  count <= MAX_REQUESTS  \* NEVER exceeded

Init ==
  /\ requests = 0
  /\ window_start = 0
  /\ count = 0

AllowRequest ==
  /\ count < MAX_REQUESTS
  /\ count' = count + 1
  /\ requests' = requests + 1
  /\ UNCHANGED window_start

RejectRequest ==
  /\ count >= MAX_REQUESTS
  /\ UNCHANGED <<requests, window_start, count>>

WindowReset ==
  /\ window_start' = window_start + WINDOW_SIZE
  /\ count' = 0
  /\ UNCHANGED requests

Next == AllowRequest \/ RejectRequest \/ WindowReset

Spec == Init /\ [][Next]_<<requests, window_start, count>>

THEOREM Spec => []SafetyInvariant
====

Z3 Example (Constraint Solving)

from z3 import *

def verify_no_overflow(max_value: int):
    """Prove that our calculation never overflows"""
    x = Int('x')
    y = Int('y')

    solver = Solver()

    # Preconditions
    solver.add(x >= 0, x <= max_value)
    solver.add(y >= 0, y <= max_value)

    # Try to find a case where result overflows
    result = x + y
    solver.add(result > 2**31 - 1)  # 32-bit overflow

    if solver.check() == sat:
        model = solver.model()
        print(f"OVERFLOW POSSIBLE: x={model[x]}, y={model[y]}")
        return False
    else:
        print("PROVEN: No overflow possible")
        return True

Limitations: Expensive to write, limited to specific properties, requires expertise.


Choosing the Right Level

Scenario Minimum Level
Internal utility function Level 3 (Unit tests)
API endpoint Level 2 (Schema) + Level 4 (Integration)
Financial calculations Level 5 (Property tests)
Security-critical code Level 5 + manual audit
Distributed consensus Level 6 (Formal verification)
Life-critical systems Level 6 (Formal verification)

The Quality Gate Stack

Layer verification for maximum coverage:

# .github/workflows/verify.yml
jobs:
  level-1-types:
    run: tsc --noEmit

  level-2-schema:
    run: npm run validate-schemas

  level-3-unit:
    run: npm test -- --coverage

  level-4-integration:
    run: npm run test:integration

  level-5-property:
    run: npm run test:property

  level-6-formal:
    run: tlc RateLimiter.tla  # If applicable

Cost vs Confidence

        ▲ Confidence
        │
   100% │                              ████ Formal
        │                         ████
    95% │                    ████ Property
        │               ████
    80% │          ████ Integration
        │     ████
    60% │████ Unit + Schema + Types
        │
        └────────────────────────────────────▶ Cost/Effort

The insight: You get 80% confidence from levels 1-3 at low cost. Levels 5-6 give the last 20% but cost 5x more.

Choose based on risk tolerance.


For Agent-Driven Development

Tell agents which level to verify at:

# .claude/commands/implement-with-verification.md
Implement the feature with verification at these levels:

1. Types: Full TypeScript strict mode
2. Schema: Zod validation on all inputs
3. Unit tests: Cover happy path + error cases
4. Property tests: Test these invariants:
   - [List invariants here]

Run all verification before marking complete.

Key Insight

Each level catches bugs that lower levels miss. The question isn’t “which level?” but “how high do we need to go for this risk?”


Related

Topics
Formal VerificationProperty Based TestingRuntime ValidationStatic TypesUnit Testing

More Insights

Cover Image for Thought Leaders

Thought Leaders

People to follow for compound engineering, context engineering, and AI agent development.

James Phoenix
James Phoenix
Cover Image for Systems Thinking & Observability

Systems Thinking & Observability

Software should be treated as a measurable dynamical system, not as a collection of features.

James Phoenix
James Phoenix