Building the Harness Around Claude Code

James Phoenix

Claude Code is a harness around an LLM. Your job is to build a harness around Claude Code.

The Mental Model

Treat this as a signal processing problem.

The LLM is a noisy channel. Each layer of harness:

Increases signal-to-noise ratio
Constrains the solution space
Provides feedback loops for self-correction

Building the Harness Around Claude Code

The Three Layers

┌─────────────────────────────────────────────────────────────┐
│                    Meta Engineering                         │
│  - Claude Agent Scripts for specific workflows              │
│  - Tests for tests                                          │
│  - Tests for telemetry                                      │
│  - Bespoke infrastructure that speeds up features/tests     │
│  - Agent Swarm Tactics                                      │
│  ┌─────────────────────────────────────────────────────┐   │
│  │         Cultural - Repository Engineering            │   │
│  │  - Metrics/logs/traces (OTEL + Jaeger)              │   │
│  │  - Testing infrastructure                            │   │
│  │  - Prod/Dev parity: Dockerised setup                │   │
│  │  - Code/package structure (DDD)                      │   │
│  │  ┌─────────────────────────────────────────────┐    │   │
│  │  │        Coding Agent (Claude Code)            │    │   │
│  │  │  ┌────────────┬─────┬──────────────┐        │    │   │
│  │  │  │claude.md   │ LLM │ claude hooks │        │    │   │
│  │  │  │setup       │     │              │        │    │   │
│  │  │  └────────────┴─────┴──────────────┘        │    │   │
│  │  └─────────────────────────────────────────────┘    │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Layer 1: Coding Agent (Claude Code)

The innermost layer. Claude Code is already a harness around the raw LLM.

Component	Purpose
claude.md setup	Onboard the agent with WHAT/WHY/HOW
LLM	The raw model (noisy channel)
claude hooks	Pre/post processing, linting, formatting

Your job here: Configure claude.md and hooks to maximise signal.

See: Writing a Good CLAUDE.md

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

Layer 2: Cultural – Repository Engineering

The environment the agent operates in. Better environment = better signal.

Observability Stack

# Metrics, logs, traces with OTEL + Jaeger
services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # UI
      - "4317:4317"    # OTLP gRPC

  otel-collector:
    image: otel/opentelemetry-collector:latest
    volumes:
      - ./otel-config.yaml:/etc/otel/config.yaml

Testing Infrastructure

# Tests that give clear signal
scripts/
├── test.sh          # Context-efficient test runner
├── lint.sh          # Auto-fixing linters
└── typecheck.sh     # Type verification

See: Context-Efficient Backpressure

Prod/Dev Parity

# Dockerised setup for maximum parity
FROM node:20-slim AS base
WORKDIR /app

# Same environment everywhere
COPY package.json bun.lockb ./
RUN bun install --frozen-lockfile

Code Structure (DDD)

src/
├── domain/           # Business logic (pure)
│   ├── entities/
│   └── value-objects/
├── application/      # Use cases
│   └── services/
├── infrastructure/   # External concerns
│   ├── database/
│   └── api/
└── presentation/     # UI/API layer

DDD gives LLMs clear boundaries. Each layer has explicit responsibilities.

Layer 3: Meta Engineering

The outer layer. Engineering the engineering process itself.

Claude Agent Scripts

# Workflow-specific agent invocations
.claude/
├── commands/
│   ├── implement-feature.md   # "Given this spec, implement..."
│   ├── fix-failing-test.md    # "This test is failing..."
│   ├── review-pr.md           # "Review this PR for..."
│   └── refactor-module.md     # "Refactor this to..."
└── hooks/
    ├── pre-commit.sh          # Lint + format before commit
    └── post-edit.sh           # Run tests after edits

Tests for Tests

// Meta-testing: verify your test infrastructure works
describe("test infrastructure", () => {
  it("run_silent captures failures correctly", async () => {
    const result = await runSilent("failing test", "exit 1");
    expect(result.success).toBe(false);
    expect(result.output).toContain("exit code");
  });

  it("backpressure compresses passing tests", async () => {
    const result = await runSilent("passing test", "exit 0");
    expect(result.output).toBe(""); // No noise on success
  });
});

Tests for Telemetry

// Verify observability works before you need it
describe("telemetry", () => {
  it("traces propagate through services", async () => {
    const span = tracer.startSpan("test-span");
    const traceId = span.spanContext().traceId;

    await callDownstreamService({ traceId });

    const traces = await jaeger.getTraces(traceId);
    expect(traces.spans.length).toBeGreaterThan(1);
  });
});

Agent Swarm Tactics

// Parallel agent execution for large tasks
async function swarmImplementation(spec: Spec) {
  const tasks = breakdownSpec(spec);

  // Launch agents in parallel
  const results = await Promise.all(
    tasks.map((task) =>
      spawnAgent({
        prompt: task.prompt,
        scope: task.files,
        constraints: task.constraints,
      })
    )
  );

  // Merge and resolve conflicts
  return mergeResults(results);
}

Layer 4: Closed-Loop Optimization

The outermost layer. Systems that optimize themselves.

Telemetry as Control Input

Instead of passive monitoring, use telemetry as active feedback:

Service under load
        ↓
Telemetry captured (memory, latency, errors)
        ↓
Constraints evaluated
        ↓
Violations detected?
        ↓
Agent proposes fix → Apply → Re-test
        ↓
Loop until constraints satisfied

Constraint-Driven Optimization

# constraints.yaml
performance:
  memory_max_mb: 300
  p99_latency_ms: 100
  heap_growth_slope: 0  # No leaks

triggers:
  on_violation: spawn_optimizer_agent
  max_iterations: 5
  escalate_to_human: true

Self-Healing Pipeline

async function optimizationLoop(service: Service) {
  while (true) {
    const metrics = await captureMetrics(service);
    const violations = evaluateConstraints(metrics, constraints);

    if (violations.length === 0) {
      return { status: 'healthy' };
    }

    const fix = await optimizerAgent.propose(violations);
    await applyFix(fix);
    await runTests();
  }
}

This is control theory applied to software. The system continuously measures, evaluates, and corrects itself.

See: Closed-Loop Telemetry-Driven Optimization

The Four Layers Summary

Layer	Focus	Signal Contribution
Layer 1: Claude Code	Agent configuration	Prompt quality
Layer 2: Repository	Environment setup	Environmental clarity
Layer 3: Meta Engineering	Process automation	Workflow efficiency
Layer 4: Closed-Loop	Self-optimization	Continuous improvement

Signal Processing View

Input (your intent)
    │
    ▼
┌─────────────────────┐
│  Meta Engineering   │  ← Shapes the problem space
│  (Scripts, Swarms)  │
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│  Repository Layer   │  ← Provides environmental signal
│  (OTEL, Tests, DDD) │
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│  Claude Code        │  ← Harness around LLM
│  (claude.md, hooks) │
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│       LLM           │  ← Noisy channel
└─────────────────────┘
    │
    ▼
Output (code, decisions)

Each layer amplifies signal and attenuates noise.

Key Insight

The LLM is the least controllable part of the system. Everything else is engineering.

Build the harness. Control what you can control. Let the LLM do what it’s good at within a well-constrained environment.

Long-Running Agent Harnesses

Source: Anthropic Engineering

For agents that span multiple context windows, the harness needs additional infrastructure.

The Core Problem

“Each new session begins with no memory of what came before.”

This mirrors engineers working in shifts without handoff documentation—gaps in continuity cause repeated work and lost progress.

Two-Agent Architecture

Split long-running work into specialized agents:

Agent	Purpose
Initializer Agent	Establishes foundation: `init.sh`, progress files, feature lists, initial commits
Coding Agent	Works within constraints: one feature per session, reads progress, leaves clean state

Session Initialization Sequence

Each coding session follows a structured startup:

1. Check working directory
2. Review progress files (claude-progress.txt)
3. Examine git history
4. Run basic health checks
5. Select next feature

Feature Lists: JSON Over Markdown

Use JSON format for feature tracking—it resists inadvertent modifications better than markdown.

{
  "features": [
    {
      "id": "auth-001",
      "name": "User login flow",
      "status": "passing",
      "tests": ["login.spec.ts"]
    },
    {
      "id": "auth-002",
      "name": "Password reset",
      "status": "failing",
      "blockers": ["Email service not configured"]
    }
  ]
}

Why JSON: Agents are less likely to declare premature victory when status is explicitly tracked in structured data.

Browser Automation for Verification

Unit tests alone are insufficient. Add E2E verification with browser automation:

// Puppeteer MCP for realistic user-flow testing
import puppeteer from 'puppeteer';

async function verifyFeature(feature: Feature) {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Test actual user flow, not just API responses
  await page.goto('/login');
  await page.fill('#email', '[email protected]');
  await page.fill('#password', 'password');
  await page.click('button[type="submit"]');

  // Verify real state change
  await expect(page.locator('.dashboard')).toBeVisible();
}

Common Failure Modes

Problem	Root Cause	Solution
Premature victory declarations	Incomplete tracking	Structured JSON feature lists
Buggy handoffs between sessions	Undocumented progress	Git commits + progress files at session start
Incomplete feature marking	Insufficient verification	Browser automation for E2E testing
Environment setup delays	Missing scripts	Pre-written `init.sh` procedures

Progress File Pattern

# claude-progress.txt

## Session 2024-01-15 14:30

### Completed
- [x] auth-001: User login flow
- [x] auth-003: Session persistence

### In Progress
- [ ] auth-002: Password reset (blocked: email service)

### Learnings
- Rate limiting middleware must be added before auth routes
- Test user cleanup required after each E2E test

### Next Session Should
1. Configure email service mock
2. Complete password reset feature
3. Add rate limiting tests

The Compound Effect at Scale

“90% of traditional programming is becoming commoditised. The other 10%? It’s now worth 1000x more.” — Kent Beck

Real-world productivity data: At Every, 2 engineers produce output equivalent to a 15-person team using compound engineering practices.

Voice-to-Feature Pipeline

Advanced compound engineering workflows:

Voice input (feature idea)
        ↓
Research agents (analyze codebase + best practices)
        ↓
Planning agents (generate detailed GitHub issues)
        ↓
Execution agents (parallel terminals)
        ↓
Human review (architecture, not syntax)

The shift: System thinking and orchestration capability now outweigh raw coding syntax knowledge.

Writing a Good CLAUDE.md – Layer 1 configuration
Context-Efficient Backpressure – Layer 2 testing
12 Factor Agents – Agent architecture principles
FP Increases LLM Signal – Code-level signal
RALPH Loop – Fresh context iteration for compound development
Agent Reliability Chasm – Why verification matters

Business Context

Liquidation Cadence – Harnesses compound when they reduce time-to-liquidation
Value Creation – Build harnesses that create customer value

Building the Harness Around Claude Code

The Mental Model

The Three Layers

Layer 1: Coding Agent (Claude Code)

Learn Prompt Engineering

Layer 2: Cultural – Repository Engineering

Observability Stack

Testing Infrastructure

Prod/Dev Parity

Code Structure (DDD)

Layer 3: Meta Engineering

Claude Agent Scripts

Tests for Tests

Tests for Telemetry

Agent Swarm Tactics

Layer 4: Closed-Loop Optimization

Telemetry as Control Input

Constraint-Driven Optimization

Self-Healing Pipeline

The Four Layers Summary

Signal Processing View

Key Insight

Long-Running Agent Harnesses

The Core Problem

Two-Agent Architecture

Session Initialization Sequence

Feature Lists: JSON Over Markdown

Browser Automation for Verification

Common Failure Modes

Progress File Pattern

The Compound Effect at Scale

Voice-to-Feature Pipeline

Related

Business Context

More Insights

LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

Agent Search Observation Loop: Learning What Context to Provide