The field guide

Context engineering

A practical guide to the discipline that decides what a model sees. Retrieval, agent patterns, reliability, evaluation, and the failure modes that break long-context systems, each with a runnable example.

The models got good at following instructions. That quietly moved the bottleneck. The hard part of building with an LLM is no longer wording a clever prompt, it is deciding what information the model gets to see at all. That job is context engineering, and it is most of the work in any serious system.

A model has no memory of you and no window onto your world beyond the text in front of it. So every answer is capped by the context you assemble: the files, the examples, the history, the tool results, in a finite window where everything you add crowds out something else. Get the right things in, keep the wrong things out, and shape what remains. That is the whole game, and this guide is a map of the moves.

Each section below links to a plain-English definition with a tested code example you can run. Start anywhere, or read it top to bottom as a path from the raw material through to the ways things break.

Browse all 12 terms in the Context Engineering Dictionary →

Foundations

What context engineering is, and the raw material it works with: the window, the tokens, the prompt.

Context engineering

Context engineering is the discipline of deciding what a model sees. Since a model can only work from the text in front of it, the quality of any answer is capped by the quality of the context you assemble.

Read definition →

Retrieval & RAG

Pulling the right information into the window at the right time, instead of hoping the model already knows it.

Chunking

Chunking is splitting a long document into smaller pieces before you embed and retrieve them. The size and overlap of the chunks decide what can be found as a unit, so it quietly makes or breaks a retrieval system.

Read definition →

Embeddings

An embedding turns a piece of text into a list of numbers that captures its meaning, so that similar ideas land near each other. Embeddings are what let you search by meaning instead of by exact keyword.

Read definition →

Retrieval-augmented generation (RAG)

RAG is the workhorse pattern of context engineering: retrieve the material relevant to a request, put it in the context, and let the model generate an answer grounded in it rather than guessing from memory.

Read definition →

Agent patterns

The shapes an LLM system can take, from fixed workflows to autonomous agents that choose their own path.

Agents vs. workflows

A workflow follows a path you designed in advance; an agent decides its own path at run time by calling tools in a loop toward a goal. Knowing which one you actually need is the first context-engineering decision.

Read definition →

Prompt chaining

Prompt chaining breaks a task into a fixed sequence of steps, feeding each step’s output into the next. It is the simplest workflow pattern, and it beats one giant prompt whenever a task has natural stages.

Read definition →

Routing

Routing classifies an input and sends it to the handler built for it. It keeps each path specialised and lets you send easy cases to a cheap model and hard cases to an expensive one, without any of the cost of a full agent.

Read definition →

Tool use

Tool use lets a model do more than produce text: you expose named actions with typed inputs, and the model calls them to read data, run code, or reach the outside world. It is the bridge from talking to doing.

Read definition →

Reliability techniques

Getting consistent, trustworthy output from a stochastic model that will not give the same answer twice.

Self-consistency

Self-consistency samples the same prompt several times and takes the majority answer. It trades a few extra calls for a big drop in variance, turning a model that sometimes slips into one that reliably lands on its best answer.

Read definition →

Evaluation

Measuring whether your system is actually any good, so you can improve it on purpose rather than by vibes.

LLM-as-judge

An LLM-as-judge uses one model call to score the output of another against a rubric. It is how you evaluate fuzzy, open-ended work at scale when there is no single correct answer to match against.

Read definition →

Failure modes

The predictable ways long-context systems break, so you can see them coming.

Lost in the middle

Lost in the middle is the tendency of models to use information at the start and end of a long context well, while missing what sits in the middle. It means a bigger context window does not automatically mean better recall.

Read definition →

Memory

Carrying the right state across turns and sessions without drowning the window in history.

Conversation history

Conversation history is the running list of past turns you re-send on every request so the model appears to remember. It is the simplest form of memory, and the first thing to overflow a context window if you never prune it.

Read definition →

Keep reading

AI Native Software Engineering

The other half of the picture: the vocabulary and workflow of building software with AI agents, from tokens and context windows to tools, subagents, and review discipline.

Go to the guide →

Want this applied to your product?

The dictionary is how I think out loud. If you want that thinking turned into a working system for your team, that is what I do.

See how I can help