The single biggest thing I have learned about working with coding agents is this. If I give Claude Code or Codex a generic prompt like “build the campaign approval flow”, the agent invents three things at once. It invents the product, it invents the domain model, and it invents the implementation. All in the same pass. Because those three inventions happen together, the result drifts in whichever direction the last example in the training data pulled it. I get something that works, and something that looks reasonable, and something I quietly dislike without quite being able to say why.
The fix is not better prompting. The fix is to arrive at the agent with two artefacts already in hand. A mocked UI and a PoC script. Then the prompt becomes “here is the mocked UI, here is the PoC script, produce a shared domain layer and wire one vertical slice.” That is a projection task, not an invention task. The agent is no longer deciding what the product is. It is no longer deciding what the domain is. It is only deciding how two artefacts line up.
This changes the expected quality of the output more than any model upgrade I have seen in the last year. It is also the thesis of this post. Everything below is the concrete story that convinced me of it.
The two days
On April 23 I worked only on the backend. A green-screen meme state machine in TypeScript, running locally, keyed off a 350-asset seed manifest with ffprobe metadata and chroma-key calibration per template. Real ffmpeg renders. Real OpenRouter tool calls. Langfuse traces. VCR-cached integration tests. Zero UI.
On April 24 I worked only on the mock UI. Three parallel Codex sessions in a Git worktree, building the campaigns page, the calendar page, and the posts page. Dummy data. Hard-coded IDs. A queue view. A create-campaign modal. Keyboard shortcuts. No backend wired up.
Nothing about day one mentions the UI. Nothing about day two touches the PoC. But the UI from day two keeps referring to objects discovered on day one. The “Skip Render Proposal” toggle in the campaign settings only exists because the PoC taught me rendering is cheap enough to skip the proposal stage for memes. The Approve / Feedback / Reject trio on each item card mirrors the action space the state machine actually accepts. None of those decisions came from a design doc. They came from the collision between two documents that were written for completely different reasons.
Two entropy reducers
The PoC script reduces capability entropy. It answers questions the UI cannot answer by itself. How slow is a render? What does the LLM actually produce? What tool outputs come back empty? In my case the PoC exposed that search tools were returning empty arrays for reasonable queries, which a mock UI would have happily hidden behind a spinner forever.
The mock UI reduces semantic entropy. It answers questions the PoC cannot answer by itself. What is separately reviewable? What should remain stable when the user rejects something? When I first built the queue, I placed items inside their parent campaign. I rebuilt it as one unified queue across all campaigns because the UI forced the question and the campaign-scoped layout felt bureaucratic. A PoC script would never have surfaced that.
Neither document, on its own, is enough to ship. Together they trap the product between two walls of constraint, and those walls are exactly what the agent needs to do good work.
Why the order matters
I used to think of mock and PoC as symmetric. Working on the actual code convinced me they are not. The PoC came first, and that was not an accident.
The PoC defines the objects. Composite plans, chroma-key profiles, caption safe bands by aspect ratio, subject-analysis bounds, render critiques. These are the nouns of the product.
The mock UI then negotiates which of those objects the user should ever see or touch. Most of the chroma-key calibration is invisible. The user never picks a similarity value or a blend coefficient. The whole point of the ingest analysis work was to remove those decisions from the LLM and from the user and park them on the template itself. The UI reflected that. The only green-screen decision the user gets to make is caption placement, because that is the only one the template cannot deterministically solve.
The sharper framing is this. The PoC defines what objects exist. The mock UI defines which of those objects the user should be allowed to feel.
The state machine is the stitch point
My first prompt on April 23 asked the agent to keep the state-machine stages identical to the existing slideshow machine. At the time I meant something narrow. I did not want to fork the state machine just because green-screen memes have different tool calls.
By day two it was clear the stages are doing much more than code reuse. They are the shared protocol between the PoC and the UI. The PoC advances an item through stages. The UI shows the item in its current stage and offers the stage-specific actions. When I added “Skip Render to Publish” to the settings, I was not inventing UI behaviour. I was exposing a stage transition the PoC already supported.
If I had let the PoC and the UI evolve on different domain models, stitching them would have been a translation problem. Because they share stages, it is an assembly problem. That shift is the exact thing that makes the agent’s job mechanical.
Two of three inventions removed
I want to come back to the thesis, because this is where it lands.
The agent’s failure mode is simultaneous invention. Product, domain, implementation, all at once. That is the pass in which drift enters.
The mock UI and the PoC script, built in parallel, remove two of those three inventions before the agent ever starts. The agent is not asked to invent the product, because the mock UI already encodes it. The agent is not asked to invent the domain, because the PoC already defines it. The only thing left is the projection. And projection is exactly the kind of task modern agents are extremely good at.
I can feel the difference in how I prompt. The prompts I send to Claude Code get shorter and more mechanical over time, not longer and more elaborate. I am no longer writing paragraph-long instructions hoping to pin down taste. I am saying “here is the mock, here is the PoC, merge them, wire one vertical slice, do not invent new states.” The agent thanks me by producing something that looks like it was written by a single-minded engineer instead of a confident tourist.
Every extra mock tightens the constraint on the PoC. Every extra PoC capability removes a degree of freedom from the UI. This is the signal that the workflow is working. The instruction surface area is shrinking while the output quality is going up.
The one failure mode
Mock-first only works if the PoC does not lie about what the mock implies. If the UI shows a button labelled “Regenerate with feedback” and the PoC cannot actually accept feedback as an input, the mock becomes theatre and the projection pass produces nonsense.
I now tag each interaction in the mock as proven, partially proven, unproven, or fake, and I only ship the proven ones to the merge step. Everything else is a note to the future me who is about to get very confident about demo day.
The rule I keep coming back to. Never let the UI mock lie about capability. Never let the PoC script invent the product. Build both, then hand the agent a projection task instead of an invention task. That is the whole game.

