Context engineering
A practical guide to the discipline that decides what a model sees. Retrieval, agent patterns, reliability, evaluation, and the failure modes that break long-context systems, each with a runnable example.
The models got good at following instructions. That quietly moved the bottleneck. The hard part of building with an LLM is no longer wording a clever prompt, it is deciding what information the model gets to see at all. That job is context engineering, and it is most of the work in any serious system.
A model has no memory of you and no window onto your world beyond the text in front of it. So every answer is capped by the context you assemble: the files, the examples, the history, the tool results, in a finite window where everything you add crowds out something else. Get the right things in, keep the wrong things out, and shape what remains. That is the whole game, and this guide is a map of the moves.
Each section below links to a plain-English definition with a tested code example you can run. Start anywhere, or read it top to bottom as a path from the raw material through to the ways things break.
Foundations
What context engineering is, and the raw material it works with: the window, the tokens, the prompt.
Retrieval & RAG
Pulling the right information into the window at the right time, instead of hoping the model already knows it.
Chunking
Chunking is splitting a long document into smaller pieces before you embed and retrieve them. The size and overlap of the chunks decide what can be found as a unit, so it quietly makes or breaks a retrieval system.
Read definition →Embeddings
An embedding turns a piece of text into a list of numbers that captures its meaning, so that similar ideas land near each other. Embeddings are what let you search by meaning instead of by exact keyword.
Read definition →Retrieval-augmented generation (RAG)
RAG is the workhorse pattern of context engineering: retrieve the material relevant to a request, put it in the context, and let the model generate an answer grounded in it rather than guessing from memory.
Read definition →Agent patterns
The shapes an LLM system can take, from fixed workflows to autonomous agents that choose their own path.
Agents vs. workflows
A workflow follows a path you designed in advance; an agent decides its own path at run time by calling tools in a loop toward a goal. Knowing which one you actually need is the first context-engineering decision.
Read definition →Prompt chaining
Prompt chaining breaks a task into a fixed sequence of steps, feeding each step’s output into the next. It is the simplest workflow pattern, and it beats one giant prompt whenever a task has natural stages.
Read definition →Routing
Routing classifies an input and sends it to the handler built for it. It keeps each path specialised and lets you send easy cases to a cheap model and hard cases to an expensive one, without any of the cost of a full agent.
Read definition →Tool use
Tool use lets a model do more than produce text: you expose named actions with typed inputs, and the model calls them to read data, run code, or reach the outside world. It is the bridge from talking to doing.
Read definition →Reliability techniques
Getting consistent, trustworthy output from a stochastic model that will not give the same answer twice.
Evaluation
Measuring whether your system is actually any good, so you can improve it on purpose rather than by vibes.
Failure modes
The predictable ways long-context systems break, so you can see them coming.
Memory
Carrying the right state across turns and sessions without drowning the window in history.
AI Native Software Engineering
The other half of the picture: the vocabulary and workflow of building software with AI agents, from tokens and context windows to tools, subagents, and review discipline.
Go to the guide →Want this applied to your product?
The dictionary is how I think out loud. If you want that thinking turned into a working system for your team, that is what I do.
See how I can help