Property-Based Testing for LLM-Generated Code: Catching Edge Cases Automatically

James Phoenix
James Phoenix

Summary

LLM-generated code often fails on edge cases that example-based tests don’t cover. Property-based testing uses libraries like fast-check to generate hundreds of random inputs and verify invariants automatically, catching bugs that LLMs miss. Instead of writing ‘password must be 8+ chars’, you write properties that should always hold true, and the framework generates test cases to prove it.

The Problem

Example-based tests validate specific inputs but miss edge cases like empty strings, unicode characters, boundary values, and unexpected formats. LLMs generate code that works for provided examples but often fails on edge cases not explicitly tested. Writing exhaustive example tests is tedious and still incomplete.

The Solution

Property-based testing generates hundreds of random inputs and verifies that invariants (properties that should always be true) hold for all of them. Instead of testing ‘validatePassword(“12345678”) === true’, you test ‘all strings >= 8 chars should validate as true’. Frameworks like fast-check, Hypothesis (Python), and QuickCheck (Haskell) automate edge case discovery.

The Problem

When testing LLM-generated code, most developers write example-based tests:

test('password validation', () => {
  expect(validatePassword('12345678')).toBe(true);
  expect(validatePassword('short')).toBe(false);
  expect(validatePassword('verylongpassword123')).toBe(true);
});

This approach has a critical flaw: you only test the examples you thought of.

What Example Tests Miss

Edge cases that break LLM-generated code:

  1. Boundary values: What about exactly 8 characters? 7 characters?
  2. Empty inputs: Empty string, null, undefined
  3. Unicode: Emojis, special characters, multi-byte unicode
  4. Whitespace: Leading/trailing spaces, tabs, newlines
  5. Special formats: HTML entities, encoded strings, injection attempts
  6. Numeric boundaries: MAX_INT, MIN_INT, infinity, NaN
  7. Type coercion: Numbers as strings, booleans as strings
  8. Array edge cases: Empty arrays, single-element arrays, very large arrays

Why LLMs Struggle with Edge Cases

LLMs generate code based on common patterns in training data. Edge cases are, by definition, uncommon, so LLMs often miss them:

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated
Claude Code + agentic systems
View Book
// LLM generates this (seems correct):
function validatePassword(password: string): boolean {
  return password.length >= 8;
}

// But fails for:
validatePassword('12345678')  // Only checks length, not character types!

// Edge cases that break it:
validatePassword('        ')   // 8 spaces - passes but shouldn't!
validatePassword('\u0000'.repeat(8))  // Null bytes
validatePassword('\n'.repeat(8))  // Newlines

Note: This article is being expanded with more property-based testing examples.

Related Concepts

Topics
EvaluationQuality GatesTestingVerification

Newsletter

Become a better AI engineer

Weekly deep dives on production AI systems, context engineering, and the patterns that compound. No fluff, no tutorials. Just what works.

Join 306K+ developers. No spam. Unsubscribe anytime.


More Insights

Cover Image for Computer Use Kills the Config Tax, Not the Trust Tax

Computer Use Kills the Config Tax, Not the Trust Tax

My sister hates job applications because they make her re-submit information she already has. That is the same pain as API app review, and the same agent that lives in my codebase can dissolve both. This feels insane, and it is the new default shape of the work.

James Phoenix
James Phoenix
Cover Image for Sentry Errors Should Spawn Agents on Your Own Machine

Sentry Errors Should Spawn Agents on Your Own Machine

A new production error is an event. Events should trigger work, not sit in a dashboard. So I wired Sentry to spawn a coding agent on my own hardware, point it at my exact stack, and open a draft PR with a fix.

James Phoenix
James Phoenix