Three artefacts. Three reduced ambiguities. One projection task instead of three inventions.

The Failure Mode
The single biggest thing I have learned about working with coding agents is this. If I give Claude Code or Codex a generic prompt like “build the campaign approval flow”, the agent invents three things at once.
It invents the product. It invents the domain model. It invents the implementation.
Because those three inventions happen together, the result drifts in whichever direction the last example in the training data pulled it. I get something that works, and something that looks reasonable, and something I quietly dislike without quite being able to say why.
The fix is not better prompting. Longer prompts do not pin down taste. Telling the model to be “senior” or “production ready” does not work either.
The fix is to arrive at the agent with three artefacts already in hand.
- Mock screens
- A PoC backend
- A spec file
Then the prompt becomes “here are the mock screens, here is the PoC backend, here is the spec, produce a shared domain layer and wire one vertical slice without inventing new semantics.” That is a projection task, not an invention task. The agent is no longer deciding what the product is. It is no longer deciding what the domain is. It is no longer deciding what the long-term contract is. It is only computing the overlap.
The Three Artefacts

Each corner of the triangle removes a different class of ambiguity.
Mock screens reduce product ambiguity. What does the user see? What is reviewable? What is grouped? What is hidden? What feels primary versus secondary?
PoC backend reduces capability ambiguity. Can this actually run? What does the model produce? Which tool calls fail? How slow is the render? Which assumptions are fake?
Spec file reduces intent ambiguity. What states are allowed? What invariants must hold? What is explicitly out of scope? Which behaviours need tests?
Any one of these alone is weak. Mock screens alone become theatre. A PoC backend alone is a clever script with no product taste. A spec alone is aspiration unless it is grounded in a screen and a working binary.
Together, they create a high-friction path for bad implementation ideas. That friction is the point. Agents need constraints more than they need encouragement.
Three Entropy Reducers

I worked on this concretely across two days.
On April 23 I worked only on the backend. A green-screen meme state machine in TypeScript, running locally, keyed off a 350-asset seed manifest with ffprobe metadata and chroma-key calibration per template. Real ffmpeg renders. Real model calls. Langfuse traces. VCR-cached integration tests. Zero UI.
On April 24 I worked only on the mock UI. Three parallel Codex sessions in a Git worktree, building the campaigns page, the calendar page, and the posts page. Dummy data. Hard-coded IDs. A queue view. A create-campaign modal. Keyboard shortcuts. No backend wired up.
The third artefact was the spec. Not a giant enterprise document. A .md file capturing product intent, the state machine, artefacts, invariants, acceptance criteria, and known fake bits.
The PoC exposed that search tools were returning empty arrays for reasonable queries, which a mock UI would have happily hidden behind a spinner forever. The mock surfaced that placing items inside their parent campaign felt bureaucratic, so I rebuilt it as one unified queue. The spec turned both discoveries into durable rules: queue items are globally reviewable, review actions must map to explicit reducer events, fake UI controls must not be wired into production routes.
None of those decisions came from a generic design doc. They came from collisions between three artefacts written for different reasons.
Why Order Matters
I used to think mock, PoC, and spec were symmetric. Working on the actual code convinced me they are not. The PoC usually comes first.
The reason is simple. Mock screens lie much more easily than a PoC backend. A mock can show a beautiful “Regenerate with feedback” button even if no system exists that can accept feedback and route it into the next generation. The PoC is the antidote to that theatre. It proves that the system has a causal spine: input goes in, tool call happens, model returns structured data, state transition occurs, artefact is produced, test can replay the path.
Once that spine exists, the mock screens negotiate which objects the user should ever see or touch. Then the spec consolidates both sides.
The PoC defines what objects can exist. The mock defines which of those objects the user should be allowed to feel. The spec defines which decisions must survive the next refactor.
Most of the chroma-key calibration is invisible. The user never picks a similarity value or a blend coefficient. The whole point of the ingest analysis work was to remove those decisions from the model and from the user and park them on the template itself. The UI reflected that. The spec preserved the reason: chroma-key calibration is template metadata, not user configuration. Caption placement is user-reviewable because subject position varies by generated content.
That kind of sentence saves you later. Without it, an agent sees a field and assumes it should be exposed.
The State Machine Is the Stitch Point
My first prompt on April 23 asked the agent to keep the state-machine stages identical to the existing slideshow machine. At the time I meant something narrow. I did not want to fork the state machine just because green-screen memes had different tool calls.
By day two it was clear the stages were doing more than code reuse. They were the shared protocol between the PoC, the UI, and the spec. The PoC advances an item through stages. The UI shows the item in its current stage and offers stage-specific actions. The spec names the allowed stages, transitions, and invariants.
The same stage enum powers the reducer, the tests, the mock data, the queue, the timeline, and eventually the persisted database row. If the PoC and UI evolved on different domain models, stitching them later would be a translation problem. Because they share stages, it is an assembly problem. Translation invites agent drift. Assembly constrains it.
Mocks Need Proof Labels
The one failure mode of mock-first work is that mocks become too convincing. A beautiful fake interaction can poison the implementation because the agent assumes it represents a real capability.
I now tag each interaction in the mock as one of four states.
- Proven. PoC supports it and at least one test exists.
- Partially proven. Domain shape exists, full path not wired.
- Unproven. UI proposes the behaviour, backend has not validated it.
- Fake. Exists only to explore product feel and must not be merged.
This single habit prevents a lot of self-deception. Fast mock work is good. Fake confidence is expensive.
Two Inventions Removed Is Good. Three Is Better.

The agent’s failure mode is simultaneous invention. Product, domain, implementation, all at once. That is the pass in which drift enters.
Mock screens, PoC backend, and spec file remove most of that invention before the agent ever starts. The agent is not asked to invent the product, because the mock screens encode it. The agent is not asked to invent the domain, because the PoC discovered the real objects. The agent is not asked to invent the durable contract, because the spec names the stages, invariants, non-goals, and acceptance criteria. The only thing left is projection. Projection is exactly the kind of task modern agents are extremely good at.
I can feel the difference in how I prompt. The prompts I send to Claude Code and Codex get shorter and more mechanical over time, not longer and more elaborate. I am no longer writing paragraph-long instructions hoping to pin down taste. I am saying “here are the mock screens, here is the PoC backend, here is the spec, merge them, wire one vertical slice, preserve the states, update the tests, do not invent new semantics.” The agent thanks me by producing something that looks like it was written by a single-minded engineer instead of a confident tourist.
Every extra mock tightens the product constraint. Every extra PoC capability removes a degree of freedom from the implementation. Every spec update preserves intent so the next pass does not have to rediscover it. The instruction surface area shrinks while the output quality goes up.
The Rule
Never let the UI mock lie about capability. Never let the PoC backend invent the product. Never let the spec drift away from what the UI and backend have actually proven.
Build all three, then hand the agent a projection task instead of an invention task. That is the whole game.

