Property-Based Testing for LLM-Generated Code: Catching Edge Cases Automatically

James Phoenix

Summary

LLM-generated code often fails on edge cases that example-based tests don’t cover. Property-based testing uses libraries like fast-check to generate hundreds of random inputs and verify invariants automatically, catching bugs that LLMs miss. Instead of writing ‘password must be 8+ chars’, you write properties that should always hold true, and the framework generates test cases to prove it.

The Problem

Example-based tests validate specific inputs but miss edge cases like empty strings, unicode characters, boundary values, and unexpected formats. LLMs generate code that works for provided examples but often fails on edge cases not explicitly tested. Writing exhaustive example tests is tedious and still incomplete.

The Solution

Property-based testing generates hundreds of random inputs and verifies that invariants (properties that should always be true) hold for all of them. Instead of testing ‘validatePassword(“12345678”) === true’, you test ‘all strings >= 8 chars should validate as true’. Frameworks like fast-check, Hypothesis (Python), and QuickCheck (Haskell) automate edge case discovery.

The Problem

When testing LLM-generated code, most developers write example-based tests:

test('password validation', () => {
  expect(validatePassword('12345678')).toBe(true);
  expect(validatePassword('short')).toBe(false);
  expect(validatePassword('verylongpassword123')).toBe(true);
});

This approach has a critical flaw: you only test the examples you thought of.

What Example Tests Miss

Edge cases that break LLM-generated code:

Boundary values: What about exactly 8 characters? 7 characters?
Empty inputs: Empty string, null, undefined
Unicode: Emojis, special characters, multi-byte unicode
Whitespace: Leading/trailing spaces, tabs, newlines
Special formats: HTML entities, encoded strings, injection attempts
Numeric boundaries: MAX_INT, MIN_INT, infinity, NaN
Type coercion: Numbers as strings, booleans as strings
Array edge cases: Empty arrays, single-element arrays, very large arrays

Why LLMs Struggle with Edge Cases

LLMs generate code based on common patterns in training data. Edge cases are, by definition, uncommon, so LLMs often miss them:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

// LLM generates this (seems correct):
function validatePassword(password: string): boolean {
  return password.length >= 8;
}

// But fails for:
validatePassword('12345678')  // Only checks length, not character types!

// Edge cases that break it:
validatePassword('        ')   // 8 spaces - passes but shouldn't!
validatePassword('\u0000'.repeat(8))  // Null bytes
validatePassword('\n'.repeat(8))  // Newlines

Note: This article is being expanded with more property-based testing examples.

Related Concepts

Test-Based Regression Patching – Write tests before fixing bugs
Quality Gates as Information Filters – Tests filter out invalid solutions
Test-Driven Prompting – Write tests before generating code

Property-Based Testing for LLM-Generated Code: Catching Edge Cases Automatically

Summary

The Problem

The Solution

The Problem

What Example Tests Miss

Why LLMs Struggle with Edge Cases

Learn Prompt Engineering

Related Concepts

More Insights

LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

Agent Search Observation Loop: Learning What Context to Provide