Hand-Roll the Core

James Phoenix
James Phoenix

The further a piece of code sits from the core of your system, the more you can give to agents. At the core, hand-roll it or write the spec yourself. Fully outsource and you lose the ability to patch it.


The Gradient

Most code in any real system isn’t uniformly core or uniformly periphery. It sits on a gradient. At one end, the API routes you’ve built a hundred times, the CRUD glue between a form and a database, the boilerplate adapter to a third-party service. At the other end, your custom orchestration, your state machines, the domain invariants that make your system yours.

The gradient matters because it determines how much of the thinking you can give away. Periphery is cheap to regenerate. If an agent writes a CRUD handler and it drifts from what I want, I rewrite it in an afternoon. Core is load-bearing. If an agent writes my workflow state machine and it drifts, I’m re-deriving how the whole system thinks.

The error I keep watching people make, and that I made myself early on, is treating “can an agent produce working code for this?” as the test. It’s the wrong test. The real test is: will I need to patch this six weeks from now, and can I patch it without re-reading the whole thing from scratch?


Custom Code Is Outside the Training Data

When an agent writes a REST handler, it’s drafting on a million examples. CRUD is the most well-represented code pattern in every training corpus. The agent’s prior and my intent converge easily, because the prior is dense right where I need it.

When an agent writes a custom orchestration layer, a bespoke state machine, or a workflow that enforces invariants specific to my domain, it’s drafting on almost nothing. There is no canonical answer for “the slideshow generation pipeline for James’s content business.” The agent will produce something. It will compile. It will run. But the structure will reflect the agent’s averaged-out prior, not my actual requirements.

This is the thing I underweighted for a long time. Code that runs is not code I understand. When the requirement shifts, and it will shift, I’m reading the agent’s code for the first time under pressure to ship a fix.

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated
Claude Code + agentic systems
View Book

The Maintenance Tax

The generation cost of core code is the smallest line item. The real cost is every patch over its lifetime.

Hand-rolled core code I carry in my head. When a bug surfaces, I already know the event flow, the invariants, the edge cases. Patching is a thirty-minute job because the mental model is already loaded.

Agent-generated core code I don’t carry in my head. Every patch starts with a re-read. Every re-read is a partial rediscovery of choices the agent made that I never evaluated. The compounding cost is worse than the compounding benefit of delegation, once the code is core enough.

This is why I default to hand-rolling anything that feels load-bearing. Not because agents can’t write it. Because I can’t afford to meet it as a stranger every time I touch it.


The Spec Option

The alternative to hand-rolling is writing the spec yourself with the precision that would let any agent, including a future one better than today’s, produce something you can walk into.

A tight spec for core code is not a PRD. It’s types, events, invariants, transitions, and the shape of the data moving through the system. It’s the level of detail where the agent is writing code, not choosing an architecture.

If I’m going to delegate a state machine, I’m writing the event union, the state union, the effects union, and the reducer signature myself. The agent fills in the branches. I’ve already encoded the mental model I need to carry anyway, so the output is legible to me because the shape is mine.

Spec-first delegation is hand-rolling in slow motion. It costs about the same cognitive budget. But for code that has to stay mine, it’s the minimum viable version of keeping it mine.


Worked Example: A State Machine I Did Not Buy

For a slideshow generation workflow I ship, I have a human-in-the-loop approval pipeline. Generate an idea, approve or revise, propose a render, approve or revise, render it, approve or revise, publish. Rejections route back to the right earlier stage.

The reflex pull is XState or LangGraph. Big frameworks with their own vocabulary, their own debugging story, their own upgrade path. Their own training-data density too, which is the seductive part.

Instead I wrote this shape in a few hundred lines:

type Events = UserEvents | SystemEvents;
type ReducerOutput = { state: State; effects: Array<Effects> };

const reducer = (state: State, event: Events): ReducerOutput => {
  switch (state.stage) {
    case "IDLE": /* ... */
    case "IDEA_AWAITING_REVIEW": /* ... */
    case "GENERATING_RENDER_PROPOSAL": /* ... */
    // ...
  }
};

class Machine {
  state: State;
  async dispatch(event: Events) { /* queue, reduce, interpret effects */ }
  async interpretEffects(effects: Array<Effects>) { /* call tools */ }
}

Three pieces. A reducer that’s pure. An effects interpreter that calls tools and emits system events. A dispatch queue that serialises work so events stop stepping on each other. The whole thing is a type-safe switch over finite states. No framework, no DSL, no runtime dependency.

The benefit isn’t “I wrote less code.” The benefit is that every transition, every effect, every event is something I chose and can reason about without lookup. When a new stage gets added, I know where it goes. When a transition misfires, the reducer is the only place to look.

I couldn’t have fully outsourced this. The shape of the event union is the shape of my product. The stages are my business logic. An agent writing this from a vague prompt would have produced something that ran, and I’d be re-learning it every time I shipped a change. Reducer plus events is the primitive. Frameworks are what you reach for when you haven’t written down the primitive yet.


The Heuristic

When I pick up a piece of work, I ask three questions. How well-represented is this pattern in the training distribution? How often will I need to patch this? How much of the system’s correctness depends on it?

Three yeses pointing toward periphery, I delegate freely. Three pointing toward core, I hand-roll or I spec it so tightly that the code is a consequence. Anywhere in between, the spec does the work.

Be wise about which code you let stop being yours.


Related

Topics
Ai Coding AgentsCode DelegationCore Vs PeripheryCustom Code MaintenanceSystem Design

Newsletter

Become a better AI engineer

Weekly deep dives on production AI systems, context engineering, and the patterns that compound. No fluff, no tutorials. Just what works.

Join 306K+ developers. No spam. Unsubscribe anytime.


More Insights

Mock the LLM, Keep the Tools Real

Agent systems have exactly one non-deterministic component: the model’s choice of tool call. Stub that. Let everything else run.

James Phoenix
James Phoenix
Cover Image for Techniques for Overcoming Chat Psychosis Bias

Techniques for Overcoming Chat Psychosis Bias

Chatbots are trained to preserve rapport with the user. Left alone, that trains you into a flattering mirror. These are the prompt-level techniques I use to break the sycophancy gradient and get honest feedback.

James Phoenix
James Phoenix