The further a piece of code sits from the core of your system, the more you can give to agents. At the core, hand-roll it or write the spec yourself. Fully outsource and you lose the ability to patch it.
The Gradient
Most code in any real system isn’t uniformly core or uniformly periphery. It sits on a gradient. At one end, the API routes you’ve built a hundred times, the CRUD glue between a form and a database, the boilerplate adapter to a third-party service. At the other end, your custom orchestration, your state machines, the domain invariants that make your system yours.
The gradient matters because it determines how much of the thinking you can give away. Periphery is cheap to regenerate. If an agent writes a CRUD handler and it drifts from what I want, I rewrite it in an afternoon. Core is load-bearing. If an agent writes my workflow state machine and it drifts, I’m re-deriving how the whole system thinks.
The error I keep watching people make, and that I made myself early on, is treating “can an agent produce working code for this?” as the test. It’s the wrong test. The real test is: will I need to patch this six weeks from now, and can I patch it without re-reading the whole thing from scratch?
Custom Code Is Outside the Training Data
When an agent writes a REST handler, it’s drafting on a million examples. CRUD is the most well-represented code pattern in every training corpus. The agent’s prior and my intent converge easily, because the prior is dense right where I need it.
When an agent writes a custom orchestration layer, a bespoke state machine, or a workflow that enforces invariants specific to my domain, it’s drafting on almost nothing. There is no canonical answer for “the slideshow generation pipeline for James’s content business.” The agent will produce something. It will compile. It will run. But the structure will reflect the agent’s averaged-out prior, not my actual requirements.
This is the thing I underweighted for a long time. Code that runs is not code I understand. When the requirement shifts, and it will shift, I’m reading the agent’s code for the first time under pressure to ship a fix.
The Maintenance Tax
The generation cost of core code is the smallest line item. The real cost is every patch over its lifetime.
Hand-rolled core code I carry in my head. When a bug surfaces, I already know the event flow, the invariants, the edge cases. Patching is a thirty-minute job because the mental model is already loaded.
Agent-generated core code I don’t carry in my head. Every patch starts with a re-read. Every re-read is a partial rediscovery of choices the agent made that I never evaluated. The compounding cost is worse than the compounding benefit of delegation, once the code is core enough.
This is why I default to hand-rolling anything that feels load-bearing. Not because agents can’t write it. Because I can’t afford to meet it as a stranger every time I touch it.
The Spec Option
The alternative to hand-rolling is writing the spec yourself with the precision that would let any agent, including a future one better than today’s, produce something you can walk into.
A tight spec for core code is not a PRD. It’s types, events, invariants, transitions, and the shape of the data moving through the system. It’s the level of detail where the agent is writing code, not choosing an architecture.
If I’m going to delegate a state machine, I’m writing the event union, the state union, the effects union, and the reducer signature myself. The agent fills in the branches. I’ve already encoded the mental model I need to carry anyway, so the output is legible to me because the shape is mine.
Spec-first delegation is hand-rolling in slow motion. It costs about the same cognitive budget. But for code that has to stay mine, it’s the minimum viable version of keeping it mine.
Worked Example: A State Machine I Did Not Buy
For a slideshow generation workflow I ship, I have a human-in-the-loop approval pipeline. Generate an idea, approve or revise, propose a render, approve or revise, render it, approve or revise, publish. Rejections route back to the right earlier stage.
The reflex pull is XState or LangGraph. Big frameworks with their own vocabulary, their own debugging story, their own upgrade path. Their own training-data density too, which is the seductive part.
Instead I wrote this shape in a few hundred lines:
type Events = UserEvents | SystemEvents;
type ReducerOutput = { state: State; effects: Array<Effects> };
const reducer = (state: State, event: Events): ReducerOutput => {
switch (state.stage) {
case "IDLE": /* ... */
case "IDEA_AWAITING_REVIEW": /* ... */
case "GENERATING_RENDER_PROPOSAL": /* ... */
// ...
}
};
class Machine {
state: State;
async dispatch(event: Events) { /* queue, reduce, interpret effects */ }
async interpretEffects(effects: Array<Effects>) { /* call tools */ }
}
Three pieces. A reducer that’s pure. An effects interpreter that calls tools and emits system events. A dispatch queue that serialises work so events stop stepping on each other. The whole thing is a type-safe switch over finite states. No framework, no DSL, no runtime dependency.
The benefit isn’t “I wrote less code.” The benefit is that every transition, every effect, every event is something I chose and can reason about without lookup. When a new stage gets added, I know where it goes. When a transition misfires, the reducer is the only place to look.
I couldn’t have fully outsourced this. The shape of the event union is the shape of my product. The stages are my business logic. An agent writing this from a vague prompt would have produced something that ran, and I’d be re-learning it every time I shipped a change. Reducer plus events is the primitive. Frameworks are what you reach for when you haven’t written down the primitive yet.
The Heuristic
When I pick up a piece of work, I ask three questions. How well-represented is this pattern in the training distribution? How often will I need to patch this? How much of the system’s correctness depends on it?
Three yeses pointing toward periphery, I delegate freely. Three pointing toward core, I hand-roll or I spec it so tightly that the code is a consequence. Anywhere in between, the spec does the work.
Be wise about which code you let stop being yours.
