Two Files Keep a Long /goal Run Alive

James Phoenix

The brief never changes; the state changes every turn. Put each in its own file. Point the /goal at the small, stable markdown brief and let the agent read and write the growing JSON state through tool calls, so the prompt stays the same size on turn one and turn three hundred.

Date: June 2026

What a /goal Is

A /goal is the agent verb that runs until a verifiable condition holds and then stops, with a fast model checking after each turn whether the work is actually done. It is the “keep going until the tests pass” command, distinct from a /loop that repeats while you watch and a /scheduled routine that fires while you are gone (the three verbs are pulled apart in Loop Engineering). Because a /goal can run for hundreds of turns unattended, it is the verb where carrying state badly hurts the most, which is what this note fixes.

The Failure Mode This Fixes

A /goal run dies the same way every time it dies on length. You write one big brief, the agent works, and on each turn it re-narrates everything it has done so far back into the context. By the fiftieth turn the prompt is a wall of “here is what I did, here is what is left, here is the current status of all 36 items,” the model is re-judging the whole job from scratch every cycle, and somewhere past the context budget the run either compacts away the part it needed or just stops converging. I watched this happen on a prototype port and the lesson was blunt: the agents were not lazy, they had no state, so they re-derived the world on every turn against a vague memory of it (the full story is in Pixel Diffs Are the Prototype Spec).

The fix is not a bigger context window. It is to stop carrying the state in the prompt at all. Separate the thing that never changes from the thing that changes every turn, and give each its own file.

Markdown Holds the Goal, JSON Holds the State

A long run has two kinds of information in it, and they behave in opposite ways. Once you see that, the split is obvious: give each kind its own file.

One kind is the goal: what done means, the rules, the things not to touch, the order to work in. This barely changes from the first turn to the last. It is something a human writes and reads, so it goes in a markdown file, GOAL.md (call it PROGRESS.md or BRIEF.md if you prefer).

The other kind is the state: which items are done, which failed, the latest score for each, files touched, where the agent is in the queue. This changes on every single turn. No one reads it as prose; the agent reads it to pick what to do next, then updates the one thing that changed. It goes in a JSON file, state.json, because JSON is easy for the agent to update one field at a time and easy for you to diff.

The shortest way to hold the difference: the markdown is the rulebook, the JSON is the scoreboard. You point the agent at the rulebook once, and it updates the scoreboard after every move. Putting both in one file is what blows the budget, because the steady rulebook gets dragged along with the scoreboard every time the score ticks, and the combined blob just keeps growing.

Point the /goal at the Markdown, Not the State

The move that makes this work in practice is counterintuitive: the actual /goal prompt you type is tiny, and it points at the markdown file rather than restating the job. Then GOAL.md itself tells the agent where the state lives and how to use it. I usually have the LLM author both files first, then write the launch line, something like:

/goal Read GOAL.md and follow it. Before each unit of work, read
state.json to pick the next item; after each unit, write the result
back into state.json. Stop when every item has status "done" or you
hit the turn cap.

And GOAL.md carries the durable half:

# Goal: port all routes to apps/web at visual parity

State lives in `state.json`. Never restate it here.

## Definition of done
- Every entry in state.json has status: "done"
- Each done entry has diffMetric < 0.02

## Loop
1. Read state.json, sort by diffMetric descending
2. Take the worst-failing item, do the work
3. Re-measure, write the new metric and status back to state.json
4. Do NOT touch items already marked "done"

## Do not touch
- apps/web/lib/auth/**

The prompt stays the same length forever because the only things that grow live behind a file read, not inside the prompt. The agent pulls the slice of state it needs when it needs it, which is just progressive disclosure applied to the run’s own memory.

Why It Survives a Long Run

Four properties fall out of the split, and together they are why this holds up where one big brief falls over.

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated

Claude Code + agentic systems

View Book

The prompt size is constant. Turn one and turn three hundred send the same GOAL.md plus a read of the same-sized state.json. The context cost is flat instead of monotonically climbing, so you never walk off the cliff where the run has to compact and forgets its own ledger.

The state is the queue and the exit condition, in one place. Because every item’s status sits in JSON, “what do I do next” is a sort and “am I done” is a predicate over the file, not a vibe the model reconstructs from chat history. The list is the queue, the threshold is the gate. This is exactly what makes a /goal safe to leave running: the gate is a machine check over state.json, not the model grading its own narration (the verb-and-gate framing is in Loop Engineering).

Compaction stops causing drift. The usual way a long run goes off the rails is a compaction: the context fills, the harness summarizes the history to make room, and the lossy summary quietly drops the constraint or the do-not-touch line the agent was holding in its head. From there it wanders. With the split, the goal was never in the chat history to lose, it is in GOAL.md, so the agent re-reads the brief after a compaction and snaps straight back to the real definition of done instead of drifting off a half-remembered one. The two files are the canonical copy; the conversation is just scratch space the agent can afford to have summarized away.

A fresh agent resumes for free. When context rots or the process dies, the next agent reads the same two files and is instantly current. There is no session state to lose, which is the whole point of file-based memory in the RALPH loop and the broader checkpoint/resume pattern. The markdown says what the job is; the JSON says how far along it got. Hand both to a clean context and it carries on mid-stream.

The Asymmetry Is the Point

It is tempting to ask why not one JSON file with a goal field, or one markdown file with a status table. The answer is that the two halves want different things from their format, and forcing them into one file makes both worse.

The goal wants to be diff-stable and human-editable. If I want to add a constraint mid-run, I edit one line of prose in GOAL.md and every future turn picks it up. A status table buried in the same file would churn underneath my edit and create merge noise.

The state wants to be machine-writable at high frequency and queryable. JSON gives the agent a structured target it can patch one field at a time and that I can jq or diff to see progress. Prose status logs are append-only narration that grows without bound and cannot be sorted. (If your state is genuinely a workflow with stages rather than a flat ledger, markdown as a state machine is the cousin pattern, but the moment you are tracking metrics per item, reach for JSON.)

A few things keep it clean: give state.json a tiny schema and tell the agent to honor it, so it does not invent new shapes mid-run. Keep GOAL.md short enough that re-reading it every turn is cheap. And put a turn cap or budget in the launch prompt, because a two-file /goal is still a loop and a loop with no stop condition is just a way to spend money (the scoring-function note and the cost section of Loop Engineering both make this case).

One Sentence

Write the unchanging goal in markdown, the per-turn state in JSON, point the /goal at the markdown, and the run survives any length because the only thing that grows lives behind a file read instead of inside the prompt.

Pixel Diffs Are the Prototype Spec – The JSON ledger that fixed a wandering /goal, in full
Loop Engineering – The /goal verb, the gate, and why a loop needs a stop condition
Agent Memory Patterns – Externalizing state so a fresh agent can resume
The RALPH Loop – Fresh-context iteration backed by file-based memory
Markdown Files as State Machines – The cousin pattern when state is a workflow, not a ledger
Progressive Disclosure of Context – Loading state on demand instead of carrying it in the prompt

Two Files Keep a Long /goal Run Alive

What a /goal Is

The Failure Mode This Fixes

Markdown Holds the Goal, JSON Holds the State

Point the /goal at the Markdown, Not the State

Why It Survives a Long Run

Read The Meta-Engineer

The Asymmetry Is the Point

One Sentence

Related

Become a better AI engineer

More Insights

Ask Your Agent to Create a Live Progress Report

Recursive Self-Improvement Loop for Agent Tooling