Build / Design and Development

02

Context Management

Treat context like a production data pipeline

Design the full context pipeline: instructions, task state, retrieved evidence, user history, tool results, compression, ordering, and expiry.

Principle

Context is not a blob of text. It is the data pipeline that determines what the model can know for this decision.

Why it matters

The model only sees what you place in the request. Production agents fail when stale instructions, oversized transcripts, irrelevant retrievals, or untrusted tool output crowd out the facts that matter. Good context management makes the agent grounded and cheaper at the same time.

Build this

  • Separate lanes for policy, task instructions, user state, retrieved knowledge, tool results, and scratch artifacts.
  • Ranking, pinning, trimming, summarisation, and expiry rules for every lane.
  • Prompt and context bundle versions that can be compared in evals.
  • Source provenance so the agent and reviewer can tell durable truth from generated text.

Watch for

  • Stuffing whole files or whole histories into every request.
  • Retrieved chunks with no source, timestamp, or reason for inclusion.
  • Tool results trusted as instructions instead of treated as untrusted data.
  • Context changes that cannot be replayed because prompts live only in code comments or dashboards.

Proof it works

  • Traces show exactly which context blocks entered the request and why.
  • A long-running session can shed stale context without losing current task state.
  • A regression test catches when a prompt or retrieval change harms a known task.

Implementation checklist

01

Model the context window as a budget with named allocations.

02

Keep instructions at the boundary, data in data lanes, and tool output quoted or tagged.

03

Prefer pointers to bulky source material until the agent actually needs the content.

04

Put context assembly under tests the same way you test application logic.

Related dictionary terms

Keep the framework connected.

Each factor is useful alone, but the system only becomes production-grade when build, run, and govern controls reinforce each other.

Return to the hub