Types → Schema → Unit tests → Property tests → Formal verification. Each rung catches what the lower rungs miss.
The Ladder
┌─────────────────────────────────────────────────────────────┐
│ LEVEL 6: Formal Verification (TLA+, Alloy, Z3) │
│ "Prove it's impossible to violate" │
├─────────────────────────────────────────────────────────────┤
│ LEVEL 5: Property-Based Testing (Hypothesis, fast-check) │
│ "Test with thousands of generated inputs" │
├─────────────────────────────────────────────────────────────┤
│ LEVEL 4: Integration Tests │
│ "Test components working together" │
├─────────────────────────────────────────────────────────────┤
│ LEVEL 3: Unit Tests │
│ "Test individual functions" │
├─────────────────────────────────────────────────────────────┤
│ LEVEL 2: Runtime Validation (Zod, io-ts) │
│ "Validate data at boundaries" │
├─────────────────────────────────────────────────────────────┤
│ LEVEL 1: Static Types (TypeScript, mypy) │
│ "Catch errors at compile time" │
└─────────────────────────────────────────────────────────────┘
▲ Confidence
│
│ Each level catches what lower levels miss
│
Level 1: Static Types
What it catches: Shape mismatches, null errors, wrong argument types
// Types catch this at compile time
function processUser(user: User): void {
console.log(user.name.toUpperCase());
}
processUser({ id: 1 }); // ✗ Error: Property 'name' is missing
processUser(null); // ✗ Error: Argument cannot be null
Limitations: Can’t enforce runtime values, business logic, or dynamic constraints.
Level 2: Runtime Validation (Zod/Schema)
What it catches: Invalid data from external sources, malformed inputs
import { z } from 'zod';
const UserSchema = z.object({
id: z.string().uuid(),
email: z.string().email(),
age: z.number().int().min(0).max(150),
role: z.enum(['admin', 'user', 'guest']),
});
// Runtime validation at API boundary
app.post('/users', (req, res) => {
const result = UserSchema.safeParse(req.body);
if (!result.success) {
return res.status(400).json({ errors: result.error.issues });
}
// result.data is now typed and validated
});
Limitations: Only validates data shape, not behavioral properties.
Level 3: Unit Tests
What it catches: Logic errors in individual functions
describe('calculateDiscount', () => {
it('applies 10% discount for orders over $100', () => {
expect(calculateDiscount(150)).toBe(15);
});
it('applies no discount for orders under $100', () => {
expect(calculateDiscount(50)).toBe(0);
});
it('handles edge case at exactly $100', () => {
expect(calculateDiscount(100)).toBe(10);
});
});
Limitations: Only tests the cases you think of. Misses edge cases you didn’t anticipate.
Level 4: Integration Tests
What it catches: Component interaction bugs, configuration errors
describe('User Registration Flow', () => {
it('creates user and sends welcome email', async () => {
const response = await request(app)
.post('/api/register')
.send({ email: '[email protected]', password: 'secure123' });
expect(response.status).toBe(201);
expect(await db.users.findByEmail('[email protected]')).toBeTruthy();
expect(emailService.sent).toContainEqual(
expect.objectContaining({ to: '[email protected]', template: 'welcome' })
);
});
});
Limitations: Still example-based. Can’t test all possible states.
Level 5: Property-Based Testing
What it catches: Edge cases you didn’t think of, invariant violations
from hypothesis import given, strategies as st
@given(
items=st.lists(st.integers()),
)
def test_sort_properties(items):
"""Test invariants that must ALWAYS hold"""
result = my_sort(items)
# Property 1: Output has same length as input
assert len(result) == len(items)
# Property 2: Output contains same elements
assert sorted(result) == sorted(items)
# Property 3: Output is actually sorted
assert all(result[i] <= result[i+1] for i in range(len(result)-1))
# Property 4: Idempotent
assert my_sort(result) == result
// TypeScript with fast-check
import fc from 'fast-check';
test('serialization roundtrip', () => {
fc.assert(
fc.property(fc.anything(), (value) => {
const serialized = serialize(value);
const deserialized = deserialize(serialized);
expect(deserialized).toEqual(value);
})
);
});
What it generates: Thousands of test cases including:
- Empty inputs
- Very large inputs
- Unicode edge cases
- Boundary values
- Null/undefined
- Deeply nested structures
Limitations: Can’t prove absence of bugs, only find them probabilistically.
Level 6: Formal Verification
What it catches: Proves properties hold for ALL possible inputs
TLA+ Example (Distributed Systems)
---- MODULE RateLimiter ----
VARIABLES requests, window_start, count
TypeInvariant ==
/\ requests \in Nat
/\ count \in 0..MAX_REQUESTS
/\ window_start \in Nat
SafetyInvariant ==
count <= MAX_REQUESTS \* NEVER exceeded
Init ==
/\ requests = 0
/\ window_start = 0
/\ count = 0
AllowRequest ==
/\ count < MAX_REQUESTS
/\ count' = count + 1
/\ requests' = requests + 1
/\ UNCHANGED window_start
RejectRequest ==
/\ count >= MAX_REQUESTS
/\ UNCHANGED <<requests, window_start, count>>
WindowReset ==
/\ window_start' = window_start + WINDOW_SIZE
/\ count' = 0
/\ UNCHANGED requests
Next == AllowRequest \/ RejectRequest \/ WindowReset
Spec == Init /\ [][Next]_<<requests, window_start, count>>
THEOREM Spec => []SafetyInvariant
====
Z3 Example (Constraint Solving)
from z3 import *
def verify_no_overflow(max_value: int):
"""Prove that our calculation never overflows"""
x = Int('x')
y = Int('y')
solver = Solver()
# Preconditions
solver.add(x >= 0, x <= max_value)
solver.add(y >= 0, y <= max_value)
# Try to find a case where result overflows
result = x + y
solver.add(result > 2**31 - 1) # 32-bit overflow
if solver.check() == sat:
model = solver.model()
print(f"OVERFLOW POSSIBLE: x={model[x]}, y={model[y]}")
return False
else:
print("PROVEN: No overflow possible")
return True
Limitations: Expensive to write, limited to specific properties, requires expertise.
Choosing the Right Level
| Scenario | Minimum Level |
|---|---|
| Internal utility function | Level 3 (Unit tests) |
| API endpoint | Level 2 (Schema) + Level 4 (Integration) |
| Financial calculations | Level 5 (Property tests) |
| Security-critical code | Level 5 + manual audit |
| Distributed consensus | Level 6 (Formal verification) |
| Life-critical systems | Level 6 (Formal verification) |
The Quality Gate Stack
Layer verification for maximum coverage:
# .github/workflows/verify.yml
jobs:
level-1-types:
run: tsc --noEmit
level-2-schema:
run: npm run validate-schemas
level-3-unit:
run: npm test -- --coverage
level-4-integration:
run: npm run test:integration
level-5-property:
run: npm run test:property
level-6-formal:
run: tlc RateLimiter.tla # If applicable
Cost vs Confidence
▲ Confidence
│
100% │ ████ Formal
│ ████
95% │ ████ Property
│ ████
80% │ ████ Integration
│ ████
60% │████ Unit + Schema + Types
│
└────────────────────────────────────▶ Cost/Effort
The insight: You get 80% confidence from levels 1-3 at low cost. Levels 5-6 give the last 20% but cost 5x more.
Choose based on risk tolerance.
For Agent-Driven Development
Tell agents which level to verify at:
# .claude/commands/implement-with-verification.md
Implement the feature with verification at these levels:
1. Types: Full TypeScript strict mode
2. Schema: Zod validation on all inputs
3. Unit tests: Cover happy path + error cases
4. Property tests: Test these invariants:
- [List invariants here]
Run all verification before marking complete.
Key Insight
Each level catches bugs that lower levels miss. The question isn’t “which level?” but “how high do we need to go for this risk?”
Related
- Constraint-First Development – Expressing constraints
- Property-Based Testing – Level 5 deep dive
- Closed-Loop Optimization – Runtime verification
- Highest Leverage – Validation as leverage
- Verification Sandwich Pattern – Establish baseline before and after code generation
- Integration Testing Patterns – Level 4 implementation strategies
- Test-Driven Prompting – Write tests before generating code
- Stateless Verification Loops – Prevent state accumulation in verification

