Software was always written in a loop: change something, run it, check the result, repeat. The last five years did not replace that loop. They kept widening the thing inside one turn of it, from a single line to a recurring workflow, and moved me one rung up the stack each time.
Author: James Phoenix | Date: June 2026
The loop never changed, the unit of work did
Test-driven development made the loop explicit around a unit of behaviour: write a failing test, make it pass, refactor. Behaviour-driven development and acceptance testing widened the same idea toward product intent. CI pushed it into the delivery pipeline. The shape stayed constant the whole time. Define the next bit of intent, run the system, check the result, tighten the design.
AI is widening that loop again. The thing inside one iteration used to be a line, a function, or a failing test. Now it can be a task, a pull request, a migration, or a recurring workflow. What keeps changing is not the loop, it is the size of the work inside it and where I apply judgment.
The progression is additive. Each era keeps the layer below it and adds a new control surface on top:
code completion -> prompt loop -> repo context -> harness -> supervised loop

1. Autocomplete (2021-2022)
Autocomplete put the model inside the editor. GitHub Copilot made it mainstream by drawing context from the code being edited and suggesting whole lines or functions. Cursor Tab is the same era: the model predicts the next edit and I accept or reject it as I type.
The loop still lived in my hands. I typed, inspected the suggestion, accepted or rejected, continued. The benefit was speed: less boilerplate, faster movement through familiar code. The limit was scope. The model helped with the next edit, but it did not own the task.
Added: model, local file context, inline completion.
2. Prompt Engineering (2022-2023)
The next step moved from completion to task steering. ReAct gave agents a primitive that still matters: reason, act, observe, repeat. The model could think about a step, call a tool, read the result, and continue. AutoGPT made the idea feel autonomous. Instead of asking for one answer, I gave the system a goal and let it prompt itself.
That shift created the first native discipline of the era. I was no longer only writing code, I was writing the instructions that caused code to be written. I still believe this skill never fully goes away. You have to know how to talk to these models well, in one form or another. The benefit was delegation at the task level. The limit was convergence: a prompt loop without disciplined context and a reliable stop condition drifts, repeats itself, or optimises for the wrong thing.
Added: tools, a goal, a prompt loop.
3. Context Engineering (2024-2025)
Once agents could act, the bottleneck became what they could see. A coding agent needs repo context, not only a prompt. It needs files, tests, logs, conventions, architecture notes, issue history, and the current state of the work. This is where Cursor Agent, Devin, and Ralph-style loops fit. Ralph made a narrow but important point: durable state should live in files and git, not only in the chat transcript.
The benefit is scope. Agents work across files, run commands, inspect failures, and make repo-aware changes. The limit is that context is not correctness. A well-contextualised agent can still finish the wrong task unless the environment can tell it what done means. The pattern that matters here is the move from assistant-in-editor to agent-in-codebase.
Added: repo context, terminal and files, tests.
4. Harness Engineering (2025-2026)
A harness is the environment a single agent runs inside. It includes the prompt, repo context, tools, sandbox, permissions, tests, linters, type checks, CI, evals, and review gates. The point of a harness is not to make the model magically correct. It is to make the work observable, constrained, and checkable.
OpenAI Codex is a clear example: it runs in isolated cloud containers, works against the provided repository, edits files, runs commands, and proposes changes for review. Claude Code fits the same era from the terminal: it understands a codebase, edits files, runs commands, and handles git workflows. The role shifts with it. I design environments, specify intent, and build the feedback loops that let agents do reliable work.
The benefit is repeatability. The agent no longer just generates code, it runs inside a system that can reject bad work. Deterministic checks come first: tests, builds, type checks, lint, contract tests, benchmarks, screenshots, traces, CI. Model-based judging helps with subjective checks, but maker and checker stay separated. These checks matter most when they push back on the agent in the moment, giving the loop backpressure so it self-corrects before a human has to step in.
Added: sandbox, verifier, CI and eval harness.
5. Loop Engineering (now)
Once the harness is reliable, the next layer is the loop that runs it. An automation executes fixed steps. A loop has a decision inside it: it checks whether the goal is met and decides whether to continue. A practical agent loop has five parts:
- Trigger: human kickoff, schedule, or event
- Goal: the desired end state
- Harness: the environment the agent runs in
- Verifier: the check that decides whether to continue
- State: memory outside the current model call
This is where the tools are converging. Codex has /goal for long-running work with a verifiable stop condition, and Automations for recurring tasks. Claude Code has /goal, /loop, and scheduled tasks. MCP gives agents a standard way to reach external tools and data. The benefit is leverage: a loop can watch CI, triage issues, bump dependencies, fix flaky tests, chase review feedback, and keep working until a condition holds. The risk grows with the leverage. A bad prompt wastes a turn. A bad loop wastes hours, mutates the repo, and generates a pile of plausible work that still needs judgment. Skills and playbooks become part of the loop substrate here, not just better prompt files, because a loop that rediscovers project conventions every run is fragile. I go deeper on building a single durable loop, the trigger that wakes it, the bounded runner that does the work, and the gate that decides whether it lands, in Loop Engineering.
Added: automations, worktrees, skills and playbooks, MCP connectors, durable memory, orchestration.
Each layer wraps the one below
The important word is wrap. Prompting did not replace coding. Context did not replace prompting. Harnesses did not replace context. Loops do not replace harnesses. Each layer wraps the one beneath it and changes where engineering judgment is applied.

A harness is the environment for one agent run. A loop is the control system around that harness. A factory is a system of loops: one finds work, another implements it, another verifies it, another opens the PR, and another escalates what needs human judgment. Each era lets me author less of the code directly and more of the system that produces it, trading fine-grained control for reach. The higher I stand, the more a single decision is worth, and the more it leans on the checks in the layers beneath it.
What loop-driven development actually means
Loop-driven development is TDD at a larger unit of intent. In TDD the loop wraps a unit of behaviour. Here it can wrap a task, a PR, a migration, or a recurring workflow.
The verifier is the difference between a loop and a vibe. Without one, I have repeated prompting. With one, the loop can converge. The verifier can be deterministic, like tests and builds, or probabilistic, like a separate reviewer model, but it has to exist outside the agent’s desire to be done. This is also where my role becomes more important, not less. I choose the goal, design the context, set the permissions, define the checks, review the result, and decide what risk is acceptable. The loop runs faster than I can type, but it cannot decide what should matter.
The takeaway
Software was always written in a loop. TDD made the loop explicit around behaviour. BDD and acceptance testing widened it toward product intent. AI is widening it again around agents, harnesses, and recurring workflows. That is the shift from test-driven to loop-driven development. Not because tests stop mattering, but because tests, evals, reviewers, sandboxes, worktrees, skills, memory, and CI are becoming parts of a larger loop.
Build the loop. Stay the engineer.
Related
- Loop Engineering – The operational how-to for the final era: trigger, runner, gate
- The Ladder of Coding Abstraction – Matching the tool to the task across these eras
- The Anatomy of an Agent Harness – What the harness layer actually contains
- The RALPH Loop – The autonomous loop that defined the context era
- Autonomous Loops Need a Scoring Function – Why the verifier is the whole game
- Two Camps of Agentic Coding – Spec-driven versus conversational, one rung down
- Sentry Errors Should Spawn Agents on Your Own Machine – A loop in production
- Own Your Control Plane – Why the engineer stays at the controls

