Property-Based Testing for LLM-Generated Code: Catching Edge Cases Automatically

James Phoenix
James Phoenix

Summary

LLM-generated code often fails on edge cases that example-based tests don’t cover. Property-based testing uses libraries like fast-check to generate hundreds of random inputs and verify invariants automatically, catching bugs that LLMs miss. Instead of writing ‘password must be 8+ chars’, you write properties that should always hold true, and the framework generates test cases to prove it.

The Problem

Example-based tests validate specific inputs but miss edge cases like empty strings, unicode characters, boundary values, and unexpected formats. LLMs generate code that works for provided examples but often fails on edge cases not explicitly tested. Writing exhaustive example tests is tedious and still incomplete.

The Solution

Property-based testing generates hundreds of random inputs and verifies that invariants (properties that should always be true) hold for all of them. Instead of testing ‘validatePassword(“12345678”) === true’, you test ‘all strings >= 8 chars should validate as true’. Frameworks like fast-check, Hypothesis (Python), and QuickCheck (Haskell) automate edge case discovery.

The Problem

When testing LLM-generated code, most developers write example-based tests:

test('password validation', () => {
  expect(validatePassword('12345678')).toBe(true);
  expect(validatePassword('short')).toBe(false);
  expect(validatePassword('verylongpassword123')).toBe(true);
});

This approach has a critical flaw: you only test the examples you thought of.

What Example Tests Miss

Edge cases that break LLM-generated code:

  1. Boundary values: What about exactly 8 characters? 7 characters?
  2. Empty inputs: Empty string, null, undefined
  3. Unicode: Emojis, special characters, multi-byte unicode
  4. Whitespace: Leading/trailing spaces, tabs, newlines
  5. Special formats: HTML entities, encoded strings, injection attempts
  6. Numeric boundaries: MAX_INT, MIN_INT, infinity, NaN
  7. Type coercion: Numbers as strings, booleans as strings
  8. Array edge cases: Empty arrays, single-element arrays, very large arrays

Why LLMs Struggle with Edge Cases

LLMs generate code based on common patterns in training data. Edge cases are, by definition, uncommon, so LLMs often miss them:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course
// LLM generates this (seems correct):
function validatePassword(password: string): boolean {
  return password.length >= 8;
}

// But fails for:
validatePassword('12345678')  // Only checks length, not character types!

// Edge cases that break it:
validatePassword('        ')   // 8 spaces - passes but shouldn't!
validatePassword('\u0000'.repeat(8))  // Null bytes
validatePassword('\n'.repeat(8))  // Newlines

Note: This article is being expanded with more property-based testing examples.

Related Concepts

Topics
Automated TestingEdge CasesFast CheckFuzzingInvariantsLlm TestingProperty Based TestingQuality GatesTest GenerationVerification

More Insights

LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

Three patterns that turn agent pipelines from opaque prompt chains into debuggable, reproducible engineering systems: (1) an LLM VCR that records and replays model interactions, (2) a Run > Step > Mes

James Phoenix
James Phoenix

Agent Search Observation Loop: Learning What Context to Provide

Watch how the agent navigates your codebase. What it searches for tells you what to hand it next time.

James Phoenix
James Phoenix