Match the tool to the task. A sanding machine for walls, a detail sander for corners, sandpaper by hand for banisters. The same principle governs how you use AI coding tools.
Author: James Phoenix | Date: March 2026
The Sanding Analogy
When you’re renovating a house, you don’t use the same tool for every surface.
Big flat walls get the industrial orbital sander. Maximum surface area, maximum speed. You’d be insane to do this by hand. It would take days for what takes an hour.
Corners, edges, trim get a detail sander. The big machine can’t reach these spots. You need something that fits the shape. Still powered, still saving you time, but smaller, more controlled, and you’re guiding it yourself.
Banisters and spindles get done by hand with sandpaper. No machine fits here. The geometry is too irregular, too curved, too delicate. Hand sanding is slower, but it’s the only thing that works. Trying to use a machine here would damage the wood.
Nobody argues about this. You pick the tool that matches the job. You move between tools constantly throughout a renovation. The leverage comes from knowing which tool to reach for, not from using the biggest one for everything.
Software engineering works the same way now.
The Four Levels
| Level | Tool | Analogy | When to Use |
|---|---|---|---|
| 4. RALPH Loop | Autonomous spec-driven loop | Industrial sander | Large, well-defined features. You write the spec, the loop executes overnight. |
| 3. Plans with Claude | Spec or plan, you stay in the loop | Detail sander | Medium features where you need to steer. You review each step, course-correct mid-stream. |
| 2. Conversational Claude/Codex | No spec, just talk to it | Sandpaper by hand | Small, fiddly tasks. Quick fixes, explorations, one-off changes. |
| 1. Manual coding | You type it yourself | Not sanding at all | Almost never. Maybe a one-line config change or a thought that’s faster to type than to explain. |
Level 4: The RALPH Loop
The industrial sander. You write a detailed spec through bidirectional prompting. You sign off on every line of the implementation plan. Then you launch the loop and walk away.
The loop picks up the next task, spins up a fresh agent, implements it, runs tests, commits, marks it done, and moves on. You check in periodically. You make architecture decisions when needed. You update the spec. But the implementation grunt work is fully autonomous.
This is for walls. Large, flat, well-understood surfaces. The kind of work where you know exactly what you want, the scope is clear, and the agent can execute without constant guidance.
When to use it:
- Features you can fully specify upfront
- Greenfield modules with clear boundaries
- Systematic work (test suites, migrations, API endpoints)
- Overnight or background work while you focus elsewhere
When NOT to use it:
- Exploratory work where you don’t know what you want yet
- Tightly integrated changes across many existing modules
- Anything where “what does done look like?” is unclear
Level 3: Plans Without Ralph
The detail sander. You write a plan or spec, but you stay in the loop. You’re not launching an autonomous loop. You’re working with the agent interactively, but with a plan guiding the work.
This is plan mode in Claude Code. You agree on the approach, then the agent executes step by step while you watch. You catch mistakes in real time. You course-correct before errors cascade. The plan keeps the agent on track. Your presence keeps the plan honest.
This is for corners and edges. The work is defined enough to plan, but complex enough that you need to be there. Maybe the codebase is intricate. Maybe the feature touches three existing systems. Maybe you’re not 100% sure the plan is right and you need to see the first few steps before committing.
When to use it:
- Features that touch existing code in non-trivial ways
- Refactors where you want to verify each step
- Work where you have a direction but not a full spec
- Anything where mid-stream architecture decisions are likely
Level 2: Conversational, No Spec
Sandpaper by hand. You just talk to the agent. “Fix this bug.” “Add a loading spinner here.” “What does this function do?” No spec, no plan, just conversation.
This is the most common mode. It’s how most people use Claude Code or Codex day to day. For small, contained tasks, it’s the right tool. Writing a spec for a three-line fix is overhead that costs more than it saves.
This is for banisters. The geometry is irregular. Each task is slightly different. The scope is small. You know when it’s done because you can see it. Trying to write a spec for “make this error message clearer” is like trying to use a sanding machine on a spindle. It doesn’t fit.
When to use it:
- Bug fixes
- Small UI tweaks
- Exploration (“how does this work?”)
- One-off scripts
- Code review assistance
- Anything you can verify in under a minute
Level 1: Manual Coding
Almost extinct. There are still moments where typing is faster than explaining. A one-line environment variable. A comment you want worded exactly right. A git command you’ve typed a thousand times.
But these moments are rare now. If you’re spending more than a few minutes writing code by hand, you’re probably at the wrong level. Move up.
The Mistake People Make
They pick one level and stay there.
The vibe coders stay at Level 2 for everything. They’ll conversationally iterate their way through a 50-file feature, burning tokens and accumulating drift, when a spec would have gotten them there in a quarter of the time.
The spec maximalists write PRDs for bug fixes. They’ll spend an hour specifying what a 30-second conversation would solve.
The RALPH evangelists try to automate tasks that need human judgment mid-stream. They launch the loop, come back to find the agent went in the wrong direction 20 iterations ago, and have to throw away everything.
Leverage comes from moving between levels fluidly. The best engineers switch levels multiple times per day. They start with RALPH on the big feature. They drop to Level 3 when they hit a tricky integration point. They do Level 2 for the quick fix that came up in code review. They might type one line manually because it’s faster than context-switching.
This is not different from how a carpenter moves between power tools, hand tools, and bare hands throughout a single day. It’s not a philosophy. It’s just craft.
How the Levels Compound
The levels aren’t independent. They reinforce each other.
Level 4 produces artifacts that make Level 3 easier. A RALPH loop creates specs, tests, and documented patterns. When you’re working at Level 3 on a related feature, that context already exists. Your plan is better because the spec ecosystem is richer.
Level 3 produces insights that improve Level 4. Working interactively, you discover edge cases and architecture decisions that should go into the spec. You update the spec. The next RALPH iteration is better.
Level 2 catches things the higher levels miss. Quick conversational debugging finds issues that formal specs didn’t anticipate. Those findings feed back into the spec (Level 4) or the plan (Level 3).
The compound effect: each level’s output becomes input for the others. Over time, the whole system gets tighter. Specs get more precise. Plans get more realistic. Conversational fixes get rarer because the higher levels catch more.
The Real Insight
You must barely ever write code manually now. That’s not laziness. It’s the same reason you don’t sand walls by hand when you own a sanding machine.
But “I have a sanding machine” doesn’t mean “I use the sanding machine for everything.” The machine is one tool in a set. The set is the leverage, not any individual tool.
The ladder of abstraction is about recognising which surface you’re looking at and reaching for the right tool without thinking about it. When you get this right, it feels effortless. You flow between autonomous loops, guided plans, quick conversations, and the occasional manual keystroke, all in service of the same project.
That’s how you gain leverage. Not by using the most powerful tool. By using the right one.
Related
- The RALPH Loop – Level 4: autonomous spec-driven iteration
- Watch the Ralph – Level 4 in practice on a real project
- Two Camps of Agentic Coding – The spec-driven vs conversational divide
- Building the Factory – Meta-infrastructure for each level
- Own Your Control Plane – Why matching tools to tasks matters
- Plan Mode Strategic Use – Level 3: when and how to use plans
- The Compound Systems Engineer Doctrine – The broader philosophy

