Summary
Write tests before prompting LLMs to generate code. Tests act as executable specifications that constrain the solution space, reducing entropy from millions of possible implementations to tens of correct ones. This pattern improves code quality, reduces iteration cycles, and builds a regression safety net automatically.
The Problem
LLMs generate code from high-entropy probability distributions, often producing syntactically correct but behaviorally wrong implementations. Without constraints, you waste time iterating on vague requirements, and bugs slip through unnoticed. The LLM doesn’t know what ‘correct’ means beyond syntax.
The Solution
Write tests first that define expected behavior as executable specifications. Then prompt the LLM to implement code that passes those tests. Tests constrain the solution space from millions of possible programs to a small set of correct implementations, dramatically improving first-pass quality and eliminating ambiguity.
The Core Insight
When you ask an LLM to “implement user authentication,” you’re asking it to sample from a probability distribution of millions of possible implementations. Most are wrong.
When you give an LLM a failing test and ask it to “make this test pass,” you’re constraining the solution space to tens of correct implementations. Most are right.
The difference: Tests are executable specifications that reduce entropy before generation.
The Problem: High-Entropy Code Generation
Without Tests (Vague Prompt)
Prompt: "Implement user authentication"
LLM considers:
- Should it hash passwords? (bcrypt? argon2? SHA256?)
- Should it validate email format? (regex? library? not at all?)
- Should it throw errors or return null? (which one?)
- Should it use sessions or JWT? (depends?)
- Should it rate-limit login attempts? (maybe?)
- Should it handle case-sensitive emails? (unclear?)
Possible implementations: ~1,000,000
Correct implementations: ~10
Success rate: 0.001%
The LLM makes thousands of micro-decisions without guidance. Most choices are wrong.
With Tests (Executable Specification)
// You write this FIRST
describe('authenticateUser', () => {
it('should return user object for valid credentials', async () => {
const result = await authenticateUser('[email protected]', 'password123');
expect(result).toMatchObject({
id: expect.any(String),
email: '[email protected]',
sessionToken: expect.any(String)
});
});
it('should throw InvalidCredentialsError for wrong password', async () => {
await expect(
authenticateUser('[email protected]', 'wrong')
).rejects.toThrow(InvalidCredentialsError);
});
it('should validate email format', async () => {
await expect(
authenticateUser('not-an-email', 'password123')
).rejects.toThrow(InvalidEmailError);
});
it('should hash passwords with bcrypt', async () => {
// Verifies implementation uses bcrypt, not plaintext
const user = await createUser('[email protected]', 'password');
expect(user.passwordHash).toMatch(/^\$2[aby]\$/);
});
});
Prompt: "Implement authenticateUser() that passes these tests"
LLM now knows:
- Must return specific object shape (not null, not boolean)
- Must throw specific error types (not generic Error)
- Must validate email format
- Must use bcrypt for hashing
Possible implementations: ~50
Correct implementations: ~30
Success rate: 60%
Result: 600x improvement in success rate.
How Test-Driven Prompting Works
The Workflow
1. Write Tests (Specification)
↓
Define expected behavior as executable code
2. Verify Tests Fail (Red)
↓
Ensure tests actually test something
3. Prompt LLM with Tests (Generation)
↓
"Implement code that makes these tests pass"
4. Run Tests (Verification)
↓
Tests pass → Done
Tests fail → Iterate with failure feedback
5. Commit Both (Safety Net)
↓
Test + implementation together
Example: Building a URL Shortener
Step 1: Write Tests First
// url-shortener.test.ts
import { shortenUrl, expandUrl } from './url-shortener';
describe('URL Shortener', () => {
it('should generate short URL for long URL', async () => {
const shortUrl = await shortenUrl('https://example.com/very/long/path?query=params');
expect(shortUrl).toMatch(/^[a-zA-Z0-9]{6}$/);
expect(shortUrl.length).toBe(6);
});
it('should expand short URL back to original', async () => {
const original = 'https://example.com/path';
const short = await shortenUrl(original);
const expanded = await expandUrl(short);
expect(expanded).toBe(original);
});
it('should throw NotFoundError for unknown short URL', async () => {
await expect(
expandUrl('XXXXXX')
).rejects.toThrow(NotFoundError);
});
it('should not create duplicate short URLs for same long URL', async () => {
const url = 'https://example.com/same';
const short1 = await shortenUrl(url);
const short2 = await shortenUrl(url);
expect(short1).toBe(short2);
});
it('should validate URL format', async () => {
await expect(
shortenUrl('not-a-url')
).rejects.toThrow(InvalidUrlError);
});
});
Step 2: Verify Tests Fail
npm test
# Output:
# ✗ should generate short URL for long URL
# Module './url-shortener' not found
Good! Tests fail because implementation doesn’t exist yet.
Step 3: Prompt LLM with Tests
Prompt to LLM:
"Implement a URL shortener in url-shortener.ts that passes all tests in url-shortener.test.ts.
Requirements from tests:
- shortenUrl(url): Generate 6-character alphanumeric short code
- expandUrl(code): Return original URL
- Throw NotFoundError for unknown codes
- Return same short code for duplicate URLs
- Validate URL format
Use an in-memory Map for storage (no database needed for this example)."
Step 4: LLM Generates Implementation
// url-shortener.ts (generated by LLM)
import { createHash } from 'crypto';
export class InvalidUrlError extends Error {
constructor() {
super('Invalid URL format');
}
}
export class NotFoundError extends Error {
constructor() {
super('Short URL not found');
}
}
const urlMap = new Map<string, string>();
const reverseMap = new Map<string, string>();
function isValidUrl(url: string): boolean {
try {
new URL(url);
return true;
} catch {
return false;
}
}
function generateShortCode(url: string): string {
// Generate deterministic short code from URL hash
const hash = createHash('sha256').update(url).digest('base64');
return hash.replace(/[^a-zA-Z0-9]/g, '').slice(0, 6);
}
export async function shortenUrl(url: string): Promise<string> {
if (!isValidUrl(url)) {
throw new InvalidUrlError();
}
// Return existing short code if URL already shortened
if (reverseMap.has(url)) {
return reverseMap.get(url)!;
}
const shortCode = generateShortCode(url);
urlMap.set(shortCode, url);
reverseMap.set(url, shortCode);
return shortCode;
}
export async function expandUrl(shortCode: string): Promise<string> {
if (!urlMap.has(shortCode)) {
throw new NotFoundError();
}
return urlMap.get(shortCode)!;
}
Step 5: Run Tests
npm test
# Output:
# ✓ should generate short URL for long URL
# ✓ should expand short URL back to original
# ✓ should throw NotFoundError for unknown short URL
# ✓ should not create duplicate short URLs for same long URL
# ✓ should validate URL format
#
# All tests passed!
Success on first try! The tests constrained the solution space enough that the LLM generated correct code.
Why This Works: Entropy Reduction
Information Theory Perspective
Tests reduce entropy in two ways:
1. Pre-Generation (Constraining the Prompt)
Tests tell the LLM what to generate:
Without tests:
Entropy = log₂(possible_implementations) = log₂(1,000,000) ≈ 20 bits
With tests:
Entropy = log₂(implementations_that_pass_tests) = log₂(50) ≈ 6 bits
Reduction: 20 - 6 = 14 bits (99.99% of invalid implementations eliminated)
2. Post-Generation (Verifying the Output)
Tests verify correctness:
LLM generates code
↓
Run tests
├→ Pass: Code is correct (with high confidence)
└→ Fail: Code is wrong (with certainty)
↓
Provide failure feedback to LLM
↓
LLM regenerates with additional constraints
Each test failure further constrains the solution space.
Mathematical Model
Let:
- $S$ = set of all syntactically valid programs
- $T_i$ = set of programs that pass test $i$
- $C$ = set of correct programs
Test-Driven Prompting ensures:
LLM generates from: S ∩ T₁ ∩ T₂ ∩ ... ∩ Tₙ
Instead of: S
Where: S ∩ T₁ ∩ T₂ ∩ ... ∩ Tₙ ≈ C
The more tests you write, the closer this intersection gets to the set of correct programs.
Best Practices
1. Write Tests for Behavior, Not Implementation
// ✅ Good: Tests behavior
it('should return sorted array', () => {
expect(sort([3, 1, 2])).toEqual([1, 2, 3]);
});
// LLM can choose: quicksort, mergesort, bubblesort, Array.sort(), etc.
// ❌ Bad: Tests implementation details
it('should use quicksort algorithm', () => {
const spy = jest.spyOn(algorithms, 'quicksort');
sort([3, 1, 2]);
expect(spy).toHaveBeenCalled();
});
// Over-constrains: LLM must use specific algorithm
Rule: Test what the code should do, not how it should do it.
2. Cover Edge Cases in Tests
describe('divide', () => {
// Happy path
it('should divide two numbers', () => {
expect(divide(10, 2)).toBe(5);
});
// Edge cases (these prevent bugs)
it('should throw error for division by zero', () => {
expect(() => divide(10, 0)).toThrow(DivisionByZeroError);
});
it('should handle negative numbers', () => {
expect(divide(-10, 2)).toBe(-5);
});
it('should handle floating point precision', () => {
expect(divide(1, 3)).toBeCloseTo(0.333, 3);
});
});
Edge case tests prevent common LLM mistakes.
3. Use Integration Tests for LLM Code
LLMs struggle with mocking. Integration tests work better:
// ✅ Good for LLMs: Integration test
describe('POST /api/users', () => {
it('should create user and return 201', async () => {
const response = await request(app)
.post('/api/users')
.send({ email: '[email protected]', password: 'password123' });
expect(response.status).toBe(201);
expect(response.body).toMatchObject({
id: expect.any(String),
email: '[email protected]'
});
// Verify user exists in database
const user = await db.users.findByEmail('[email protected]');
expect(user).toBeDefined();
});
});
// ❌ Harder for LLMs: Unit test with mocks
it('should call userRepository.create', async () => {
const mockRepo = {
create: jest.fn().mockResolvedValue({ id: '1', email: '[email protected]' })
};
// LLMs often generate incorrect mock setups
});
Why: Integration tests are closer to natural language requirements.
4. Make Tests Self-Documenting
Tests should read like specifications:
// ✅ Good: Clear, descriptive
describe('User Registration', () => {
it('should create user account with hashed password', async () => { });
it('should send verification email to user', async () => { });
it('should reject duplicate email addresses', async () => { });
it('should require password minimum 8 characters', async () => { });
});
// ❌ Bad: Vague, unclear
describe('Users', () => {
it('works', async () => { });
it('handles errors', async () => { });
});
5. Provide Test Data in Tests
Don’t make LLM guess example data:
// ✅ Good: Concrete test data
it('should validate email format', async () => {
await expect(validateEmail('[email protected]')).resolves.toBe(true);
await expect(validateEmail('invalid')).resolves.toBe(false);
await expect(validateEmail('user@domain')).resolves.toBe(false);
await expect(validateEmail('@example.com')).resolves.toBe(false);
});
// LLM knows exactly what to validate
// ❌ Bad: Vague test
it('should validate email format', async () => {
expect(validateEmail(validEmail)).toBe(true);
expect(validateEmail(invalidEmail)).toBe(false);
});
// What are validEmail and invalidEmail?
Advanced Patterns
Pattern 1: Incremental Test-Driven Prompting
Build complex features incrementally:
// Iteration 1: Basic functionality
describe('UserService (v1)', () => {
it('should create user', async () => { });
it('should find user by id', async () => { });
});
Prompt: "Implement basic UserService"
// ✅ Tests pass
// Iteration 2: Add validation
describe('UserService (v2)', () => {
// Keep existing tests
it('should create user', async () => { });
it('should find user by id', async () => { });
// Add new tests
it('should reject invalid email on create', async () => { });
it('should reject duplicate emails', async () => { });
});
Prompt: "Add email validation to UserService"
// ✅ All tests pass (including old ones = no regression)
// Iteration 3: Add authentication
describe('UserService (v3)', () => {
// Keep all previous tests...
// Add auth tests
it('should authenticate user with correct password', async () => { });
it('should reject incorrect password', async () => { });
});
Prompt: "Add authentication to UserService"
Benefits:
- Each iteration builds on previous work
- Old tests prevent regressions
- Complexity grows gradually
Pattern 2: Test-Driven Refactoring
Refactor with confidence:
// Step 1: Write tests for existing behavior
describe('UserService (current behavior)', () => {
it('should create user (existing test)', async () => { });
it('should find user by id (existing test)', async () => { });
// Document all current behavior
});
Prompt: "Refactor UserService to use factory pattern instead of class"
// Step 2: After refactoring, tests still pass
// ✅ Behavior preserved despite implementation change
Pattern 3: Property-Based Test-Driven Prompting
Use property-based testing for stronger constraints:
import { fc, test } from '@fast-check/vitest';
// Property: sorting should always produce ordered array
test.prop([fc.array(fc.integer())])
('sorted array should be in ascending order', (arr) => {
const sorted = sort(arr);
for (let i = 0; i < sorted.length - 1; i++) {
expect(sorted[i]).toBeLessThanOrEqual(sorted[i + 1]);
}
});
// Property: sorting should preserve all elements
test.prop([fc.array(fc.integer())])
('sorted array should contain same elements', (arr) => {
const sorted = sort(arr);
expect(sorted.sort()).toEqual(arr.sort());
});
Prompt: "Implement sort() that satisfies these properties for ANY input array"
Why powerful: Properties test infinite cases, not just examples.
Pattern 4: Snapshot Test-Driven Prompting
For complex outputs:
it('should generate correct ESLint config', () => {
const config = generateESLintConfig({
typescript: true,
react: true,
strictMode: true
});
expect(config).toMatchSnapshot();
});
Prompt: "Implement generateESLintConfig() that produces this snapshot"
Use case: Config generation, code transformation, complex object outputs.
Handling Test Failures
Iteration Loop with Feedback
When tests fail, provide failure details to LLM:
Iteration 1:
Prompt: "Implement authenticateUser that passes tests"
LLM: [generates code]
Tests: ❌ FAIL
✗ should throw InvalidCredentialsError for wrong password
Expected: InvalidCredentialsError
Received: null
Iteration 2:
Prompt: "Fix: Test failed because function returns null instead of throwing InvalidCredentialsError. Update implementation."
LLM: [fixes code]
Tests: ✅ PASS
Key: Include specific failure messages in next prompt.
Common Failure Patterns
Failure: Wrong Return Type
Test expects: { id: string, email: string }
Code returns: User object with 20 fields
Fix prompt: "Return only {id, email}, not full User object"
Failure: Missing Error Handling
Test expects: InvalidEmailError thrown
Code does: Returns false
Fix prompt: "Throw InvalidEmailError for invalid email, don't return boolean"
Failure: Wrong Validation Logic
Test expects: Reject 'user@domain' (no TLD)
Code does: Accepts it
Fix prompt: "Email validation should require TLD (.com, .org, etc.)"
Measuring Success
Metric 1: First-Pass Success Rate
How often does generated code pass all tests immediately?
Without test-driven prompting: ~10-20% first-pass success
With test-driven prompting: ~50-70% first-pass success
Improvement: 3-7x better
Metric 2: Iteration Count
How many prompt iterations needed?
Without tests:
- Iteration 1: Generate code
- Iteration 2: Fix bug A
- Iteration 3: Fix bug B
- Iteration 4: Fix bug C
- Iteration 5: Finally works
Average: 5 iterations
With tests:
- Iteration 1: Generate code that passes tests
- Iteration 2: Fix test failures (if any)
Average: 1.5 iterations
Improvement: 3x fewer iterations
Metric 3: Test Coverage
Test coverage grows automatically:
Traditional approach:
- Write code
- Manually add tests later (if time permits)
Coverage: 30-50%
Test-driven prompting:
- Tests written before code
- Implementation matches tests exactly
Coverage: 80-95%
Improvement: 2x higher coverage
Metric 4: Regression Rate
How often do bugs reappear?
Without tests: 40% regression rate
(LLM regenerates bug in future iterations)
With tests: 2% regression rate
(Tests prevent LLM from regenerating bugs)
Improvement: 20x fewer regressions
Integration with Other Patterns
Combine with Claude Code Hooks
Automate test execution:
// .claude/hooks/post-write.json
{
"command": "npm test -- --related {file}",
"description": "Run tests after LLM writes code"
}
Now tests run automatically after every code generation.
Combine with Verification Sandwich
Pre-generation (reduce entropy):
├─ Write tests (executable specification)
├─ Provide type signatures
└─ Include example implementations
↓ LLM generates code
Post-generation (verify):
├─ Run tests (behavioral verification)
├─ Type check (structural verification)
└─ Lint (style verification)
Combine with Test-Based Regression Patching
Bug discovered:
├─ Write test that catches bug (test-driven prompting)
├─ Prompt LLM to fix bug
├─ Test passes (regression patched)
└─ Test prevents future regressions
Common Pitfalls
❌ Pitfall 1: Writing Tests After Code
This defeats the purpose:
// Wrong order
1. Prompt: "Implement user authentication"
2. LLM generates code
3. You write tests to match what LLM generated
Problem: Tests confirm what was built, not what should be built
Solution: Tests come first, always.
❌ Pitfall 2: Tests Too Vague
// Too vague
it('should work correctly', () => {
const result = doSomething();
expect(result).toBeTruthy();
});
// Specific
it('should return array of user objects with id and email fields', () => {
const result = getUsers();
expect(result).toEqual([
{ id: '1', email: '[email protected]' },
{ id: '2', email: '[email protected]' }
]);
});
❌ Pitfall 3: Over-Specifying Implementation
// Over-specified (tests implementation)
it('should use bcrypt with 10 rounds and salt', () => {
const spy = jest.spyOn(bcrypt, 'hash');
hashPassword('password');
expect(spy).toHaveBeenCalledWith('password', 10);
});
// Better (tests behavior)
it('should produce different hashes for same password', () => {
const hash1 = hashPassword('password');
const hash2 = hashPassword('password');
expect(hash1).not.toBe(hash2);
expect(hash1).toMatch(/^\$2[aby]\$/);
});
❌ Pitfall 4: No Test Verification
Always verify tests fail before implementation:
# Step 1: Write test
# Step 2: Run test (should FAIL)
npm test
# ✗ Test fails (good!)
# Step 3: Generate implementation
# Step 4: Run test (should PASS)
npm test
# ✓ Test passes (good!)
If test passes without implementation, it’s not testing anything.
When to Use Test-Driven Prompting
✅ Always Use For:
- New features: Tests define requirements
- Bug fixes: Tests catch regressions
- Refactoring: Tests verify behavior preservation
- Complex logic: Tests clarify edge cases
- APIs: Tests define contracts
⚠️ Consider Alternatives For:
- Exploratory prototypes: Tests might slow exploration
- UI layout: Visual tests harder to write
- Performance optimization: Need benchmarks, not tests
- One-time scripts: Tests might be overkill
❌ Don’t Use For:
- Simple configuration: Tests add more complexity than value
- Documentation: Tests aren’t the right format
Conclusion
Test-Driven Prompting transforms how you work with LLMs:
Without tests:
- High entropy (millions of possible implementations)
- Vague requirements
- Many iterations
- Low first-pass success rate
- Bugs slip through
- No regression prevention
With tests:
- Low entropy (tens of correct implementations)
- Precise requirements
- Few iterations
- High first-pass success rate
- Tests catch bugs automatically
- Permanent regression prevention
Key Takeaways:
- Write tests before prompting – they’re executable specifications
- Tests reduce entropy – constrain solution space from millions to tens
- Verify tests fail – ensure they test something
- Provide failure feedback – help LLM iterate correctly
- Use integration tests – work better with LLMs than unit tests
- Build coverage automatically – tests and code grow together
The Result: Higher quality code, fewer iterations, automatic regression prevention, and a growing safety net that makes your codebase more robust over time.
Test-Driven Prompting isn’t just good practice. It’s information-theoretic optimization of LLM code generation.
Mathematical Foundation
$$S_{\text{constrained}} = S \cap T_1 \cap T_2 \cap \cdots \cap T_n \approx C$$
How Tests Constrain the Solution Space
This formula shows how tests narrow down possible implementations to correct ones.
$S_{\text{constrained}}$ – The constrained solution space
This is the set of programs the LLM will actually generate from. With tests, this becomes much smaller than the original space.
$S$ – All syntactically valid programs
This is the starting point: every program that compiles and runs without syntax errors.
Example: For “implement authentication”, this might be:
- 1,000,000 different implementations
- Most are wrong (return wrong types, missing validation, etc.)
- LLM picks from this massive space
$\cap$ – Intersection (AND)
The intersection symbol means “only programs that satisfy ALL conditions”.
Think of it as applying filters:
Programs that are syntactically valid
AND pass test 1
AND pass test 2
AND pass test 3
...
$T_i$ – Programs that pass test i
$T_1$ = set of programs that pass the first test
$T_2$ = set of programs that pass the second test
$T_n$ = set of programs that pass the nth test
Example tests:
- $T_1$: Programs that return
{id, email, sessionToken}object - $T_2$: Programs that throw
InvalidCredentialsErrorfor wrong password - $T_3$: Programs that validate email format
- $T_4$: Programs that hash passwords with bcrypt
Each test eliminates programs that don’t meet that requirement.
$\approx C$ – Approximately equals correct programs
The symbol $\approx$ means “approximately equal to”.
$C$ is the set of correct programs: implementations that actually solve the problem correctly.
The formula says: When you intersect all test constraints, you get very close to the set of correct programs.
Concrete Example
Let’s trace through authentication:
S = All syntactically valid auth implementations
= 1,000,000 programs
T₁ = Programs that return correct object shape
= 100,000 programs (90% eliminated)
T₂ = Programs that throw correct error types
= 10,000 programs (90% of remaining eliminated)
T₃ = Programs that validate email format
= 1,000 programs (90% of remaining eliminated)
T₄ = Programs that use bcrypt hashing
= 100 programs (90% of remaining eliminated)
T₅ = Programs that handle edge cases
= 50 programs (50% of remaining eliminated)
S_constrained = S ∩ T₁ ∩ T₂ ∩ T₃ ∩ T₄ ∩ T₅
= 50 programs
C = Correct programs
≈ 30-40 programs
Overlap: S_constrained ∩ C ≈ 30 programs
Result: Instead of picking from 1,000,000 programs (0.003% correct), LLM picks from 50 programs (60% correct).
Visual Representation
Without tests:
┌─────────────────────────────────────┐
│ │
│ S (1M programs) │
│ │
│ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ │
│ ● ● (10 correct) │ ← C is tiny subset
│ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ │
│ │
└─────────────────────────────────────┘
LLM picks randomly: 0.001% chance of correct
With tests:
┌─────────────────────────────────────┐
│ S (1M programs) │
│ ┌──────────────────────┐ │
│ │ T₁ (100K programs) │ │
│ │ ┌──────────────┐ │ │
│ │ │ T₂ (10K) │ │ │
│ │ │ ┌──────┐ │ │ │
│ │ │ │ T₃ │ │ │ │
│ │ │ │ (1K) │ │ │ │
│ │ │ │ ┌─┐ │ │ │ │
│ │ │ │ │C│ │ │ │ │ ← C overlaps heavily
│ │ │ │ └─┘ │ │ │ │ with constrained
│ │ │ └──────┘ │ │ │ space
│ │ └──────────────┘ │ │
│ └──────────────────────┘ │
└─────────────────────────────────────┘
LLM picks from inner circle: 60% chance of correct
Why More Tests = Better Constraints
No tests:
S_constrained = S
|S_constrained| = 1,000,000
P(correct) = 10/1,000,000 = 0.001%
1 test:
S_constrained = S ∩ T₁
|S_constrained| = 100,000
P(correct) = 8/100,000 = 0.008%
3 tests:
S_constrained = S ∩ T₁ ∩ T₂ ∩ T₃
|S_constrained| = 1,000
P(correct) = 5/1,000 = 0.5%
5 tests:
S_constrained = S ∩ T₁ ∩ T₂ ∩ T₃ ∩ T₄ ∩ T₅
|S_constrained| = 50
P(correct) = 30/50 = 60%
The more tests you add, the closer $S_{\text{constrained}}$ gets to $C$.
Practical Application
When writing tests for LLM prompting:
- Start with type constraints (reduce $S$ by 90%)
- Add behavior tests (reduce by another 90%)
- Add edge case tests (reduce by another 90%)
- Add validation tests (reduce by another 90%)
Each test multiplicatively shrinks the space:
1M × 0.1 × 0.1 × 0.1 × 0.1 = 100 programs
From: 1,000,000 possible implementations
To: 100 highly-constrained implementations
Success rate improves: 0.001% → 30-60%
The Key Insight
Without tests: LLM samples from $S$ (huge, mostly wrong)
With tests: LLM samples from $S \cap T_1 \cap T_2 \cap \cdots \cap T_n$ (small, mostly correct)
This is why test-driven prompting works: mathematics guarantees better results.
Related Concepts
- Test-Based Regression Patching – Fix bugs by writing tests first
- Quality Gates as Information Filters – How tests reduce state space
- Verification Sandwich Pattern – Combining pre- and post-generation constraints
- Integration Testing Patterns – Why integration tests work better for LLMs
- Test Custom Infrastructure – Test your testing infrastructure to avoid cascading failures
- Property-Based Testing for LLM-Generated Code – Catch edge cases with invariant-based testing
- Automated Flaky Test Detection – Diagnose intermittent test failures systematically
- Type-Driven Development – Using types as compile-time specifications
- Information Theory Coding Agents – Theoretical foundation for entropy reduction
- Few-Shot Prompting with Project Examples – Complement tests with concrete code examples to constrain LLM output
References
- Test-Driven Development – Kent Beck – The original TDD book that inspired test-driven prompting
- Testing Library – Guiding Principles – Philosophy of testing behavior over implementation
- Fast-check: Property-Based Testing – Property-based testing library for TypeScript

