The Harness Is Cheaper Now

James Phoenix
James Phoenix

Building the harness used to be overhead. Now it is cheaper than building the thing and figuring out what is wrong with it.

Author: James Phoenix | Date: March 2026


The Economics Flipped

For decades, the harness was a tax. Specs, tests, CI pipelines, type systems, linting, observability. All of it cost time upfront. Time that could have been spent shipping.

So engineers skipped it. Ship first, fix later. Move fast and break things. The math worked because writing code was the bottleneck. Every hour spent on infrastructure was an hour not spent on features.

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course

That math no longer holds.

LLMs made implementation cheap. An agent can scaffold an API endpoint in minutes. A feature that took a day now takes an hour. The bottleneck moved.

The bottleneck is now figuring out what is wrong.


Where Time Actually Goes

Watch an engineer working with agents for a week. Track where the hours go. These are rough estimates from my own experience, not formal measurements.

Building features:        20%
Debugging agent output:   35%
Re-running failed attempts: 15%
Refactoring messy code:   20%
Context reconstruction:   10%

The majority of time is not building. It is understanding, correcting, and recovering from incorrect output. This is the “figure out what is wrong” cost.

Now compare with a harness-first workflow:

Writing specs + tests:    30%
Agent executes from spec: 15%
Reviewing passing code:   10%
Fixing spec gaps:         15%
Extending harness:        20%
Context reconstruction:   10%

The total time is similar. But the second workflow compounds. Each spec and test you write reduces future debugging. The first workflow does not compound. Each debugging session is disposable.


Why Debugging Is Expensive Now

Debugging was manageable when humans wrote the code. You wrote it, so you understood it. The mental model was already in your head.

Agent-generated code breaks this assumption. You did not write it. You may not understand it. The debugging cost includes:

  1. Reading code you did not write. LLM output is syntactically correct but often structurally unfamiliar. You have to build a mental model from scratch.

  2. Tracing intent through layers. The agent made architectural choices you did not specify. Understanding why it chose a particular pattern takes time.

  3. Finding subtle incorrectness. Agent code passes the happy path. The bugs live in edge cases, race conditions, and implicit assumptions. These are the hardest bugs to find.

  4. Context window waste. Every debugging cycle consumes tokens. The agent reads files, proposes fixes, tries again. A single bug can burn through 30-50% of a context window.

  5. Cascading errors. One wrong architectural choice early in a session propagates through everything the agent builds on top of it. By the time you notice, the fix is a rewrite.

The cost of debugging agent output is higher than the cost of debugging your own code. This is the key insight most engineers miss.


Why The Harness Is Cheap Now

The same LLMs that made implementation cheap also made harness construction cheap.

Harness component Time to build (2023) Time to build (2026)
Test suite for a module 2-4 hours 15-30 minutes
PRD for a feature 1-2 hours 20-40 minutes
Type definitions 30-60 minutes 5-10 minutes
CI pipeline 2-4 hours 20-40 minutes
Linting rules 1-2 hours 10-20 minutes
Design doc with interfaces 2-3 hours 30-60 minutes

Agents are excellent at generating constraints. Types, tests, schemas, validation logic. This is structured, pattern-heavy work. Exactly what LLMs are good at.

The harness that used to cost a week now costs a day. Sometimes less.


The Crossover Point

There is a complexity threshold where harness-first becomes strictly cheaper.

Total Cost
    ↑
    │  ╲                    ╱
    │   ╲  Debug-first    ╱
    │    ╲              ╱
    │     ╲           ╱
    │      ╲        ╱
    │       ╲     ╱  ← Crossover
    │        ╲  ╱
    │         ╳
    │       ╱  ╲
    │     ╱     ╲
    │   ╱        ╲  Harness-first
    │ ╱            ╲
    └──────────────────────────────→ System Complexity

For trivial tasks (a button colour change, a copy edit), skip the harness. The debug cost is near zero.

For anything involving state, multiple files, or business logic, the crossover has already happened. The harness is cheaper.

The threshold used to be around “medium complexity” projects. With current model capabilities, it has shifted left. It now sits somewhere around “anything that touches more than 2-3 files.”


The Compound Effect

Here is the part that makes the economics asymmetric.

Debug-first costs are linear. Each new feature incurs roughly the same debugging overhead. Session N is no cheaper than session 1. You are paying rent.

Harness-first costs are front-loaded and diminishing. Each spec, test, and constraint you write:

  • Reduces future debugging for that module
  • Gives future agents better context
  • Catches regressions automatically
  • Survives context window resets
Cumulative Cost Over Time

Debug-first:    ╱ (linear, every session costs the same)
               ╱
              ╱
             ╱
            ╱

Harness-first: ╱ (steep start, then flattens)
              ╱
             ╱
            ·
           ·
          ·  ·  ·  ·  ·  (diminishing marginal cost)

After 10 features, the harness-first engineer has a test suite, type system, and spec library that makes feature 11 nearly free to verify. The debug-first engineer is still paying full price on feature 11.

This is compound leverage. The harness pays dividends on every future session.


The Signal Processing Argument

Treat the LLM as a noisy channel. Every layer of harness increases signal-to-noise ratio.

Without harness:

Intent → LLM → Output (high noise)
                  ↓
             Debug (expensive)
                  ↓
             Re-run (more noise)
                  ↓
             Debug again

With harness:

Intent → Spec → LLM → Output (constrained)
                         ↓
                    Tests (automatic verification)
                         ↓
                    Pass? → Ship
                    Fail? → Agent fixes against spec (cheap)

The harness constrains the output space before generation. Debugging constrains it after. Pre-generation constraints are cheaper because they prevent entire categories of errors. Post-generation debugging finds them one at a time.


What Counts As Harness

Not everything needs to be formal. The harness is any artifact that constrains agent output before or during generation.

Artifact Cost Leverage
A 10-line PRD 5 minutes Prevents wrong feature
Type definitions 10 minutes Prevents wrong interfaces
3 acceptance tests 15 minutes Prevents wrong behaviour
A design doc with file list 20 minutes Prevents wrong architecture
Invariants (“X must never happen”) 5 minutes Prevents dangerous failures
Example input/output pairs 10 minutes Prevents misinterpretation

The minimum viable harness for any non-trivial task: a short PRD, types, and 3 tests. This takes 30 minutes and saves hours.


The Objection: “But I Ship Faster Without It”

You ship the first version faster. You do not ship the correct version faster.

The perceived speed of skipping the harness is real on day 1. By day 5, you are debugging. By day 10, you are refactoring. By day 20, you have rewritten the module.

The harness-first engineer is slower on day 1 and faster on every subsequent day. The total time to a correct, maintainable feature is lower.

This is the same argument that was made for test-driven development in 2005. The difference is that agents make both the harness and the implementation cheap. There is no longer a tradeoff. The harness is just cheaper.


Practical Protocol

For any task that touches more than 2-3 files:

  1. Write the spec (5-20 minutes). What does this feature do? What are the constraints? What must never happen?

  2. Define the interfaces (5-10 minutes). Types, function signatures, data shapes. Let the agent generate these, review them.

  3. Write 3-5 tests (10-20 minutes). Happy path, one edge case, one failure mode. Agent can generate these from the spec.

  4. Run the agent against the spec + tests. Implementation is now constrained. The agent has a target to hit and a way to verify it hit the target.

  5. Review the output. If tests pass, review is fast. You are checking architecture and edge cases, not correctness. The tests already verified correctness.

Total harness cost: 20-50 minutes.
Expected debugging cost without harness: 1-3 hours.

The math is not close.


The Meta

The harness used to be something you built after you could afford it. A luxury for mature teams with time to spare.

Now the harness is the cheapest path to correct software. Agents made implementation free. They did not make correctness free. The harness is how you buy correctness, and the price just dropped by 10x.

Engineers who skip the harness are optimising for 2023 economics. The constraint has moved. Build the harness first.


Related

Topics
Ci CdCode QualityLlmProductivityTesting

More Insights

Cover Image for The Two Camps of Agentic Coding

The Two Camps of Agentic Coding

One camp talks to models. The other camp specifies systems. The second camp is where the real leverage lives.

James Phoenix
James Phoenix
Cover Image for Traditional ML vs AI Engineers

Traditional ML vs AI Engineers

The fundamental difference is the **order of operations**.

James Phoenix
James Phoenix