The further a piece of code sits from the core of your system, the more you can give to agents. At the core, hand-roll it or write the spec yourself. Fully outsource and you lose the ability to patch it.
The Gradient
Most code in any real system isn’t uniformly core or uniformly periphery. It sits on a gradient. At one end, the API routes you’ve built a hundred times, the CRUD glue between a form and a database, the boilerplate adapter to a third-party service. At the other end, your custom orchestration, your state machines, the domain invariants that make your system yours.
The gradient matters because it determines how much of the thinking you can give away. Periphery is cheap to regenerate. If an agent writes a CRUD handler and it drifts from what I want, I rewrite it in an afternoon. Core is load-bearing. If an agent writes my workflow state machine and it drifts, I’m re-deriving how the whole system thinks.
The error I keep watching people make, and that I made myself early on, is treating “can an agent produce working code for this?” as the test. It’s the wrong test. The real test is: will I need to patch this six weeks from now, and can I patch it without re-reading the whole thing from scratch?
Custom Code Is Outside the Training Data
Think about what the model has seen a million times versus what it has seen almost never. Then look at what you’re asking it to write.
| Dense in the training corpus | Sparse or absent |
|---|---|
| REST and GraphQL handlers | Your workflow state machine |
| Postgres schemas, SQL, pagination | Your domain invariants |
| Queue consumers (Kafka, SQS, Redis) | Your orchestration of domain-specific tools |
| Auth middleware, JWT, OAuth flows | Your event and effect unions |
| CRUD endpoints, form validation | Your reducer branches and transition rules |
| Caching layers, rate limiting | How your pipeline stages feed each other |
Everything in the left column is the training distribution. Billions of lines of open-source code, Stack Overflow answers, framework docs, and tutorials have already converged on what the right answer looks like. When the agent drafts a Postgres schema or a Fastify route, its prior and my intent collapse onto each other almost immediately. The code I want is close to the modal code the agent would produce for that kind of request, which is why CRUD feels like the place agents shine.
Everything in the right column is a distribution of one. There is no canonical answer for “the slideshow generation pipeline for James’s content business.” There is no Stack Overflow thread about my event union, my reducer’s rejection routing, or the specific invariants I care about. When I ask the agent to write this, its prior pulls toward the average of all workflow code it has ever seen, which is structurally different from the thing I actually need. The agent will produce something. It will compile. It will run. But the structure reflects that averaged-out prior, not my actual requirements.
This asymmetry is the whole game. The periphery is code the training corpus has already solved. The core is code the training corpus has never seen. When an agent writes the periphery, I get near-zero drift from what I’d have written myself. When an agent writes the core, the drift is structural, invisible at review time, and only surfaces six weeks later when the requirement shifts and I’m reading the agent’s version of my own business logic as a stranger.
Code that runs is not code I understand. That’s the cost.
The Maintenance Tax
The generation cost of core code is the smallest line item. The real cost is every patch over its lifetime.
Hand-rolled core code I carry in my head. When a bug surfaces, I already know the event flow, the invariants, the edge cases. Patching is a thirty-minute job because the mental model is already loaded.
Agent-generated core code I don’t carry in my head. Every patch starts with a re-read. Every re-read is a partial rediscovery of choices the agent made that I never evaluated. The compounding cost is worse than the compounding benefit of delegation, once the code is core enough.
This is why I default to hand-rolling anything that feels load-bearing. Not because agents can’t write it. Because I can’t afford to meet it as a stranger every time I touch it.
The Spec Option
The alternative to hand-rolling is writing the spec yourself with the precision that would let any agent, including a future one better than today’s, produce something you can walk into.
A tight spec for core code is not a PRD. It’s types, events, invariants, transitions, and the shape of the data moving through the system. It’s the level of detail where the agent is writing code, not choosing an architecture.
If I’m going to delegate a state machine, I’m writing the event union, the state union, the effects union, and the reducer signature myself. The agent fills in the branches. I’ve already encoded the mental model I need to carry anyway, so the output is legible to me because the shape is mine.
Spec-first delegation is hand-rolling in slow motion. It costs about the same cognitive budget. But for code that has to stay mine, it’s the minimum viable version of keeping it mine.
Worked Example: A State Machine I Did Not Buy
For a slideshow generation workflow I ship, I have a human-in-the-loop approval pipeline. Generate an idea, approve or revise, propose a render, approve or revise, render it, approve or revise, publish. Rejections route back to the right earlier stage.
The reflex pull is XState or LangGraph. Big frameworks with their own vocabulary, their own debugging story, their own upgrade path. Their own training-data density too, which is the seductive part.
Instead I wrote this shape in a few hundred lines:
type Events = UserEvents | SystemEvents;
type ReducerOutput = { state: State; effects: Array<Effects> };
const reducer = (state: State, event: Events): ReducerOutput => {
switch (state.stage) {
case "IDLE": /* ... */
case "IDEA_AWAITING_REVIEW": /* ... */
case "GENERATING_RENDER_PROPOSAL": /* ... */
// ...
}
};
class Machine {
state: State;
async dispatch(event: Events) { /* queue, reduce, interpret effects */ }
async interpretEffects(effects: Array<Effects>) { /* call tools */ }
}
Three pieces. A reducer that’s pure. An effects interpreter that calls tools and emits system events. A dispatch queue that serialises work so events stop stepping on each other. The whole thing is a type-safe switch over finite states. No framework, no DSL, no runtime dependency.
The benefit isn’t “I wrote less code.” The benefit is that every transition, every effect, every event is something I chose and can reason about without lookup. When a new stage gets added, I know where it goes. When a transition misfires, the reducer is the only place to look.
I couldn’t have fully outsourced this. The shape of the event union is the shape of my product. The stages are my business logic. An agent writing this from a vague prompt would have produced something that ran, and I’d be re-learning it every time I shipped a change. Reducer plus events is the primitive. Frameworks are what you reach for when you haven’t written down the primitive yet.
The Heuristic
When I pick up a piece of work, I ask three questions. How well-represented is this pattern in the training distribution? How often will I need to patch this? How much of the system’s correctness depends on it?
Three yeses pointing toward periphery, I delegate freely. Three pointing toward core, I hand-roll or I spec it so tightly that the code is a consequence. Anywhere in between, the spec does the work.
Be wise about which code you let stop being yours.
