Claude Code is a harness around an LLM. Your job is to build a harness around Claude Code.
The Mental Model
Treat this as a signal processing problem.
The LLM is a noisy channel. Each layer of harness:
- Increases signal-to-noise ratio
- Constrains the solution space
- Provides feedback loops for self-correction

The Three Layers
┌─────────────────────────────────────────────────────────────┐
│ Meta Engineering │
│ - Claude Agent Scripts for specific workflows │
│ - Tests for tests │
│ - Tests for telemetry │
│ - Bespoke infrastructure that speeds up features/tests │
│ - Agent Swarm Tactics │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Cultural - Repository Engineering │ │
│ │ - Metrics/logs/traces (OTEL + Jaeger) │ │
│ │ - Testing infrastructure │ │
│ │ - Prod/Dev parity: Dockerised setup │ │
│ │ - Code/package structure (DDD) │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Coding Agent (Claude Code) │ │ │
│ │ │ ┌────────────┬─────┬──────────────┐ │ │ │
│ │ │ │claude.md │ LLM │ claude hooks │ │ │ │
│ │ │ │setup │ │ │ │ │ │
│ │ │ └────────────┴─────┴──────────────┘ │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Layer 1: Coding Agent (Claude Code)
The innermost layer. Claude Code is already a harness around the raw LLM.
| Component | Purpose |
|---|---|
| claude.md setup | Onboard the agent with WHAT/WHY/HOW |
| LLM | The raw model (noisy channel) |
| claude hooks | Pre/post processing, linting, formatting |
Your job here: Configure claude.md and hooks to maximise signal.
Layer 2: Cultural – Repository Engineering
The environment the agent operates in. Better environment = better signal.
Observability Stack
# Metrics, logs, traces with OTEL + Jaeger
services:
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # UI
- "4317:4317" # OTLP gRPC
otel-collector:
image: otel/opentelemetry-collector:latest
volumes:
- ./otel-config.yaml:/etc/otel/config.yaml
Testing Infrastructure
# Tests that give clear signal
scripts/
├── test.sh # Context-efficient test runner
├── lint.sh # Auto-fixing linters
└── typecheck.sh # Type verification
See: Context-Efficient Backpressure
Prod/Dev Parity
# Dockerised setup for maximum parity
FROM node:20-slim AS base
WORKDIR /app
# Same environment everywhere
COPY package.json bun.lockb ./
RUN bun install --frozen-lockfile
Code Structure (DDD)
src/
├── domain/ # Business logic (pure)
│ ├── entities/
│ └── value-objects/
├── application/ # Use cases
│ └── services/
├── infrastructure/ # External concerns
│ ├── database/
│ └── api/
└── presentation/ # UI/API layer
DDD gives LLMs clear boundaries. Each layer has explicit responsibilities.
Layer 3: Meta Engineering
The outer layer. Engineering the engineering process itself.
Claude Agent Scripts
# Workflow-specific agent invocations
.claude/
├── commands/
│ ├── implement-feature.md # "Given this spec, implement..."
│ ├── fix-failing-test.md # "This test is failing..."
│ ├── review-pr.md # "Review this PR for..."
│ └── refactor-module.md # "Refactor this to..."
└── hooks/
├── pre-commit.sh # Lint + format before commit
└── post-edit.sh # Run tests after edits
Tests for Tests
// Meta-testing: verify your test infrastructure works
describe("test infrastructure", () => {
it("run_silent captures failures correctly", async () => {
const result = await runSilent("failing test", "exit 1");
expect(result.success).toBe(false);
expect(result.output).toContain("exit code");
});
it("backpressure compresses passing tests", async () => {
const result = await runSilent("passing test", "exit 0");
expect(result.output).toBe(""); // No noise on success
});
});
Tests for Telemetry
// Verify observability works before you need it
describe("telemetry", () => {
it("traces propagate through services", async () => {
const span = tracer.startSpan("test-span");
const traceId = span.spanContext().traceId;
await callDownstreamService({ traceId });
const traces = await jaeger.getTraces(traceId);
expect(traces.spans.length).toBeGreaterThan(1);
});
});
Agent Swarm Tactics
// Parallel agent execution for large tasks
async function swarmImplementation(spec: Spec) {
const tasks = breakdownSpec(spec);
// Launch agents in parallel
const results = await Promise.all(
tasks.map((task) =>
spawnAgent({
prompt: task.prompt,
scope: task.files,
constraints: task.constraints,
})
)
);
// Merge and resolve conflicts
return mergeResults(results);
}
Layer 4: Closed-Loop Optimization
The outermost layer. Systems that optimize themselves.
Telemetry as Control Input
Instead of passive monitoring, use telemetry as active feedback:
Service under load
↓
Telemetry captured (memory, latency, errors)
↓
Constraints evaluated
↓
Violations detected?
↓
Agent proposes fix → Apply → Re-test
↓
Loop until constraints satisfied
Constraint-Driven Optimization
# constraints.yaml
performance:
memory_max_mb: 300
p99_latency_ms: 100
heap_growth_slope: 0 # No leaks
triggers:
on_violation: spawn_optimizer_agent
max_iterations: 5
escalate_to_human: true
Self-Healing Pipeline
async function optimizationLoop(service: Service) {
while (true) {
const metrics = await captureMetrics(service);
const violations = evaluateConstraints(metrics, constraints);
if (violations.length === 0) {
return { status: 'healthy' };
}
const fix = await optimizerAgent.propose(violations);
await applyFix(fix);
await runTests();
}
}
This is control theory applied to software. The system continuously measures, evaluates, and corrects itself.
See: Closed-Loop Telemetry-Driven Optimization
The Four Layers Summary
| Layer | Focus | Signal Contribution |
|---|---|---|
| Layer 1: Claude Code | Agent configuration | Prompt quality |
| Layer 2: Repository | Environment setup | Environmental clarity |
| Layer 3: Meta Engineering | Process automation | Workflow efficiency |
| Layer 4: Closed-Loop | Self-optimization | Continuous improvement |
Signal Processing View
Input (your intent)
│
▼
┌─────────────────────┐
│ Meta Engineering │ ← Shapes the problem space
│ (Scripts, Swarms) │
└─────────────────────┘
│
▼
┌─────────────────────┐
│ Repository Layer │ ← Provides environmental signal
│ (OTEL, Tests, DDD) │
└─────────────────────┘
│
▼
┌─────────────────────┐
│ Claude Code │ ← Harness around LLM
│ (claude.md, hooks) │
└─────────────────────┘
│
▼
┌─────────────────────┐
│ LLM │ ← Noisy channel
└─────────────────────┘
│
▼
Output (code, decisions)
Each layer amplifies signal and attenuates noise.
Key Insight
The LLM is the least controllable part of the system. Everything else is engineering.
Build the harness. Control what you can control. Let the LLM do what it’s good at within a well-constrained environment.
Long-Running Agent Harnesses
Source: Anthropic Engineering
For agents that span multiple context windows, the harness needs additional infrastructure.
The Core Problem
“Each new session begins with no memory of what came before.”
This mirrors engineers working in shifts without handoff documentation—gaps in continuity cause repeated work and lost progress.
Two-Agent Architecture
Split long-running work into specialized agents:
| Agent | Purpose |
|---|---|
| Initializer Agent | Establishes foundation: init.sh, progress files, feature lists, initial commits |
| Coding Agent | Works within constraints: one feature per session, reads progress, leaves clean state |
Session Initialization Sequence
Each coding session follows a structured startup:
1. Check working directory
2. Review progress files (claude-progress.txt)
3. Examine git history
4. Run basic health checks
5. Select next feature
Feature Lists: JSON Over Markdown
Use JSON format for feature tracking—it resists inadvertent modifications better than markdown.
{
"features": [
{
"id": "auth-001",
"name": "User login flow",
"status": "passing",
"tests": ["login.spec.ts"]
},
{
"id": "auth-002",
"name": "Password reset",
"status": "failing",
"blockers": ["Email service not configured"]
}
]
}
Why JSON: Agents are less likely to declare premature victory when status is explicitly tracked in structured data.
Browser Automation for Verification
Unit tests alone are insufficient. Add E2E verification with browser automation:
// Puppeteer MCP for realistic user-flow testing
import puppeteer from 'puppeteer';
async function verifyFeature(feature: Feature) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Test actual user flow, not just API responses
await page.goto('/login');
await page.fill('#email', '[email protected]');
await page.fill('#password', 'password');
await page.click('button[type="submit"]');
// Verify real state change
await expect(page.locator('.dashboard')).toBeVisible();
}
Common Failure Modes
| Problem | Root Cause | Solution |
|---|---|---|
| Premature victory declarations | Incomplete tracking | Structured JSON feature lists |
| Buggy handoffs between sessions | Undocumented progress | Git commits + progress files at session start |
| Incomplete feature marking | Insufficient verification | Browser automation for E2E testing |
| Environment setup delays | Missing scripts | Pre-written init.sh procedures |
Progress File Pattern
# claude-progress.txt
## Session 2024-01-15 14:30
### Completed
- [x] auth-001: User login flow
- [x] auth-003: Session persistence
### In Progress
- [ ] auth-002: Password reset (blocked: email service)
### Learnings
- Rate limiting middleware must be added before auth routes
- Test user cleanup required after each E2E test
### Next Session Should
1. Configure email service mock
2. Complete password reset feature
3. Add rate limiting tests
The Compound Effect at Scale
“90% of traditional programming is becoming commoditised. The other 10%? It’s now worth 1000x more.” — Kent Beck
Real-world productivity data: At Every, 2 engineers produce output equivalent to a 15-person team using compound engineering practices.
Voice-to-Feature Pipeline
Advanced compound engineering workflows:
Voice input (feature idea)
↓
Research agents (analyze codebase + best practices)
↓
Planning agents (generate detailed GitHub issues)
↓
Execution agents (parallel terminals)
↓
Human review (architecture, not syntax)
The shift: System thinking and orchestration capability now outweigh raw coding syntax knowledge.
Related
- Writing a Good CLAUDE.md – Layer 1 configuration
- Context-Efficient Backpressure – Layer 2 testing
- 12 Factor Agents – Agent architecture principles
- FP Increases LLM Signal – Code-level signal
- RALPH Loop – Fresh context iteration for compound development
- Agent Reliability Chasm – Why verification matters
Business Context
- Liquidation Cadence – Harnesses compound when they reduce time-to-liquidation
- Value Creation – Build harnesses that create customer value

