Context is not a blob of text. It is the data pipeline that determines what the model can know for this decision.
Why it matters
The model only sees what you place in the request. Production agents fail when stale instructions, oversized transcripts, irrelevant retrievals, or untrusted tool output crowd out the facts that matter. Good context management makes the agent grounded and cheaper at the same time.
Build this
- Separate lanes for policy, task instructions, user state, retrieved knowledge, tool results, and scratch artifacts.
- Ranking, pinning, trimming, summarisation, and expiry rules for every lane.
- Prompt and context bundle versions that can be compared in evals.
- Source provenance so the agent and reviewer can tell durable truth from generated text.
Watch for
- Stuffing whole files or whole histories into every request.
- Retrieved chunks with no source, timestamp, or reason for inclusion.
- Tool results trusted as instructions instead of treated as untrusted data.
- Context changes that cannot be replayed because prompts live only in code comments or dashboards.
Proof it works
- Traces show exactly which context blocks entered the request and why.
- A long-running session can shed stale context without losing current task state.
- A regression test catches when a prompt or retrieval change harms a known task.
Implementation checklist
Model the context window as a budget with named allocations.
Keep instructions at the boundary, data in data lanes, and tool output quoted or tagged.
Prefer pointers to bulky source material until the agent actually needs the content.
Put context assembly under tests the same way you test application logic.