Property-Based Testing for LLM-Generated Code: Catching Edge Cases Automatically

James Phoenix
James Phoenix

Summary

LLM-generated code often fails on edge cases that example-based tests don’t cover. Property-based testing uses libraries like fast-check to generate hundreds of random inputs and verify invariants automatically, catching bugs that LLMs miss. Instead of writing ‘password must be 8+ chars’, you write properties that should always hold true, and the framework generates test cases to prove it.

The Problem

Example-based tests validate specific inputs but miss edge cases like empty strings, unicode characters, boundary values, and unexpected formats. LLMs generate code that works for provided examples but often fails on edge cases not explicitly tested. Writing exhaustive example tests is tedious and still incomplete.

The Solution

Property-based testing generates hundreds of random inputs and verifies that invariants (properties that should always be true) hold for all of them. Instead of testing ‘validatePassword(“12345678”) === true’, you test ‘all strings >= 8 chars should validate as true’. Frameworks like fast-check, Hypothesis (Python), and QuickCheck (Haskell) automate edge case discovery.

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course

The Problem

When testing LLM-generated code, most developers write example-based tests:

test('password validation', () => {
  expect(validatePassword('12345678')).toBe(true);
  expect(validatePassword('short')).toBe(false);
  expect(validatePassword('verylongpassword123')).toBe(true);
});

This approach has a critical flaw: you only test the examples you thought of.

What Example Tests Miss

Edge cases that break LLM-generated code:

  1. Boundary values: What about exactly 8 characters? 7 characters?
  2. Empty inputs: Empty string, null, undefined
  3. Unicode: Emojis, special characters, multi-byte unicode
  4. Whitespace: Leading/trailing spaces, tabs, newlines
  5. Special formats: HTML entities, encoded strings, injection attempts
  6. Numeric boundaries: MAX_INT, MIN_INT, infinity, NaN
  7. Type coercion: Numbers as strings, booleans as strings
  8. Array edge cases: Empty arrays, single-element arrays, very large arrays

Why LLMs Struggle with Edge Cases

LLMs generate code based on common patterns in training data. Edge cases are, by definition, uncommon, so LLMs often miss them:

// LLM generates this (seems correct):
function validatePassword(password: string): boolean {
  return password.length >= 8;
}

// But fails for:
validatePassword('12345678')  // Only checks length, not character types!

// Edge cases that break it:
validatePassword('        ')   // 8 spaces - passes but shouldn't!
validatePassword('\u0000'.repeat(8))  // Null bytes
validatePassword('\n'.repeat(8))  // Newlines

Note: This article is being expanded with more property-based testing examples.

Related Concepts

Topics
Automated TestingEdge CasesFast CheckFuzzingInvariantsLlm TestingProperty Based TestingQuality GatesTest GenerationVerification

More Insights

Cover Image for Thought Leaders

Thought Leaders

People to follow for compound engineering, context engineering, and AI agent development.

James Phoenix
James Phoenix
Cover Image for Systems Thinking & Observability

Systems Thinking & Observability

Software should be treated as a measurable dynamical system, not as a collection of features.

James Phoenix
James Phoenix