The language of AI coding, defined.
69 plain-English definitions for the vocabulary behind AI coding agents. Written for engineers who want to actually understand the tools, not just use them. Every entry is free to read and free to copy.
Foundations
The base layer: what a model is, how it turns text into tokens, and why the same prompt can give different answers.
AI
In the coding-agent world, "AI" almost always means a large language model: a system that predicts the next chunk of text from everything it has been shown. It is not a mind and it is not a database. It is a very good pattern completer.
Read definition →Effort
Effort is a dial for how much internal reasoning a model spends before it answers. Turn it up for genuinely hard problems; you pay for it in latency and extra output tokens.
Read definition →Inference
Inference is the act of running a trained model to get an answer: text goes in, a prediction comes out. Every message you send to a coding agent is an inference. It is the opposite end of the lifecycle from training.
Read definition →Model
A model is the trained artifact at the centre of every AI coding tool: a large file of numbers (parameters) that, given some text, produces the most likely continuation. When people say "which model are you using," this is the thing they mean.
Read definition →Next-token prediction
Next-token prediction is the one job a language model does: given the text so far, predict the most likely next token, add it, and repeat. It is both the training objective and what runs at inference.
Read definition →Non-determinism
Non-determinism is why the same prompt can give you different answers. At inference the model samples among likely next tokens with a controlled amount of randomness, so runs vary.
Read definition →Parameters
Parameters are the learned numbers (weights) inside a model that hold everything it appears to know. The count of them is what people mean by model size, and they are fixed once training ends.
Read definition →Token
A token is the unit of text a model reads and writes: a chunk that is usually part of a word, not a whole word or a single character. Everything is measured in tokens, including your context window and your bill.
Read definition →Training
Training is the process that produces a model: showing it enormous amounts of text and adjusting its parameters until it gets good at predicting what comes next. It happens once, before you ever use the model.
Read definition →Providers & requests
How a coding agent actually talks to a model: the harness, the request, and what you pay for on the way in and out.
Cache tokens
Cache tokens are input tokens served from the prefix cache at a reduced rate. They are how prompt caching shows up as a separate line in your usage numbers.
Read definition →Harness
The harness is the code wrapped around a model that builds requests, runs tools, manages context, and enforces permissions. It is the agent minus the model, and it is where most of the real engineering lives.
Read definition →Input tokens
Input tokens are the tokens you send in a request: the system prompt, the conversation history, loaded files, and tool definitions. You are billed for them, and they count against the context window.
Read definition →Model provider
A model provider is the company or service that hosts a model behind an API. Your agent sends requests to it and gets completions back; you never run the model yourself.
Read definition →Model provider request
A model provider request is a single API call to the provider carrying the messages, tools, and settings for one step. It is the atomic unit of agent work, and one turn can be many requests.
Read definition →Output tokens
Output tokens are the tokens a model generates in its response, including any hidden reasoning. They are usually priced higher than input tokens, and turning up effort produces more of them.
Read definition →Prefix cache
A prefix cache lets a provider reuse the unchanged front of your request instead of reprocessing it, so repeated prefixes are cheaper and faster. It is the main reason keeping the start of your prompt stable pays off.
Read definition →Stateful
Stateful describes anything that keeps state across requests: conversation history, memory, a session. In an agent that job belongs to the harness or app, never to the stateless model API.
Read definition →Stateless
Stateless means the model API keeps no memory between requests. Each call starts blank, so every request must carry all the context the model needs. This is foundational to how agents are built.
Read definition →Context
Everything the model can see for a single request, and the moves you make to keep it useful as work grows.
Autocompact
Autocompact is the agent compacting the context automatically when the window nears full. Convenient, but it can silently drop detail you cared about.
Read definition →Clearing
Clearing is deliberately wiping the context to start fresh. It is often the cleanest fix for a bloated or confused window.
Read definition →Compaction
Compaction is condensing older conversation history into a summary to reclaim context-window space while keeping the important gist. It is lossy by design.
Read definition →Context
Context is all the text a model can see for a single request: the system prompt, your message, the conversation so far, and any files or tool output the agent has pulled in. It is the only thing the model knows about your specific situation.
Read definition →Context window
The context window is the maximum amount of text, measured in tokens, that a model can consider for a single request. It is a hard ceiling, and it is the main resource you manage when working with an agent.
Read definition →Lost in the middle
Lost in the middle is the well-known tendency for models to attend best to the start and end of a long context and to miss information buried in the middle.
Read definition →Session
A session is one continuous conversation with an agent that accumulates history in the context window. Resetting or ending it clears that history and starts the agent from a blank slate.
Read definition →System prompt
The system prompt is the standing instruction placed at the very start of the context that sets the model’s role, rules, and tone before the conversation begins. It shapes every reply without being part of the back-and-forth.
Read definition →Turn
A turn is one round of the agent loop: your input, the model doing its work (possibly several tool calls), and its response. A single turn can span many provider requests.
Read definition →Agents & tools
The loop that turns a chat model into something that reads files, runs commands, and edits your codebase.
Agent
An agent is a language model wrapped in a loop that lets it call tools, read the results, and decide what to do next. The model supplies the judgement; the loop and the tools give it hands.
Read definition →Agent mode
Agent mode is a setting where the model runs the loop autonomously, planning and acting on its own, rather than giving a single chat reply or edit. More capable, and it needs more trust.
Read definition →Environment
The environment is the surroundings an agent acts in: working directory, files, shell, environment variables, and network. It defines what the agent's tools can actually reach.
Read definition →Filesystem
The filesystem is the set of files an agent can read and write. It is its main source of truth and its main way to make durable changes.
Read definition →MCP (Model Context Protocol)
MCP is an open standard for connecting agents to tools and data. Instead of hard-coding an integration into every agent, you run an MCP server once and any MCP-aware agent can use it.
Read definition →Skill
A skill is a packaged, reusable set of instructions an agent loads on demand for a specific kind of task. It is progressive disclosure of know-how instead of cramming everything into the system prompt.
Read definition →Subagent
A subagent is a separate agent that a main agent spawns to handle a scoped subtask, with its own fresh context. It does the work, returns a short result, and the noise of how it got there never touches the main conversation.
Read definition →Tool
A tool is a named action, with a typed input schema, that a model is allowed to call. Tools are how a model that can only produce text gets to actually do things: read a file, run a command, search the web.
Read definition →Tool call
A tool call is the model’s request to use a tool: it names the tool and supplies the arguments, then pauses. It has not run anything. Your harness is what actually executes the action.
Read definition →Tool result
A tool result is the output of running a tool, fed back into the conversation so the model can use it. It is tied to the tool call that requested it, and it is how the model sees the consequences of its own actions.
Read definition →Permissions & safety
The guardrails that decide what an agent may touch, and when a human has to say yes.
AFK
AFK means running an agent unattended for long stretches while you are away from the keyboard. It is only safe with strong guardrails and automated checks, since no human is watching each step.
Read definition →Human in the loop
Human in the loop means keeping a person in the agent's decision path to approve, steer, or verify its work. It is the deliberate counterweight to full autonomy.
Read definition →Permission mode
Permission mode is the policy that decides which actions an agent can take on its own and which ones need your approval, ranging from ask-every-time to full auto. It trades safety for flow.
Read definition →Permission request
A permission request is the moment an agent stops and asks you to approve a consequential action, such as running a command or writing a file, before it happens. It is the seam where a human can catch a mistake before it lands.
Read definition →Sandbox
A sandbox is an isolated environment that limits what an agent can touch, such as the filesystem and network, so a mistake stays contained and cannot damage the real system.
Read definition →Knowledge & failure modes
What the model knows, where that knowledge ends, and the predictable ways it gets things wrong.
Attention
Attention is the mechanism a model uses to weigh how strongly each token in its context relates to the others when predicting the next one. It is the basis of how a model actually uses context.
Read definition →Attention budget
The attention budget is the idea that a model's effective attention is a finite resource spread across the whole context window. The more you put in, the thinner the attention on each piece.
Read definition →Attention degradation
Attention degradation is the quality drop a model shows as its context grows: recall weakens and it misses or confuses buried details, often well below the hard token limit. It is also called context rot.
Read definition →Contextual knowledge
Contextual knowledge is what a model knows because it is in the context right now: the files, docs, and output you gave it. It is current and grounded, and it is the main lever against hallucination.
Read definition →Hallucination
A hallucination is a confident, plausible-sounding output that is simply wrong: an invented API, a fabricated file path, a made-up citation. It is not the model lying. It is the model doing exactly what it always does, predicting plausible text, with no built-in sense of truth.
Read definition →Knowledge cutoff
The knowledge cutoff is the date after which a model learned nothing from training. It is a common source of outdated APIs, so give the model current docs to compensate.
Read definition →Parametric knowledge
Parametric knowledge is what a model knows from training, stored in its parameters. It is broad and instantly available but frozen, unsourced, and not always reliable.
Read definition →Sycophancy
Sycophancy is a model's tendency to agree with you and tell you what you want to hear rather than push back. It is why "am I right?" is a leading question that produces a leading answer.
Read definition →Context engineering
Deliberately shaping what goes into the context window: memory, specs, handoffs, and progressive disclosure.
AGENTS.md
AGENTS.md is a project file of standing instructions and conventions that an agent loads into context at the start of a session. It gives a repo its own durable memory, checked into version control next to the code.
Read definition →Context pointer
A context pointer is a reference (a path, URL, or id) you give an agent instead of the full content, so it can fetch the material only if and when it needs it. It is a cheap way to make a lot of context available.
Read definition →Handoff
A handoff is passing work from one session or agent to the next by summarising the current state, so the successor can continue without relearning everything. It is the antidote to a dead or overflowing window.
Read definition →Handoff artifact
A handoff artifact is the concrete document produced at a handoff, recording what is done, what is next, and the key decisions. The next session reads it to get up to speed fast.
Read definition →Memory system
A memory system is an external store the harness uses to persist facts across sessions and reload them into context. It is how a stateless model ends up behaving as if it remembers you and your project.
Read definition →Primary source
A primary source is the authoritative original: the actual code, the real types, the official docs. Point agents at primary sources so they read reality instead of guessing from memory.
Read definition →Progressive disclosure
Progressive disclosure is revealing detail to the model only when it is needed, via pointers and on-demand loading, instead of putting everything into context up front. It saves window space and attention.
Read definition →Secondary source
A secondary source is second-hand information: blog posts, summaries, or the model's own memory. It is useful for orientation but must be checked against the primary source before you rely on it.
Read definition →Spec
A spec is a written description of what to build and why, handed to the agent up front. Specs-as-context reliably beat vague one-line requests.
Read definition →Ticket
A ticket is a scoped unit of work carrying enough context to act on. Well-formed tickets are ideal agent inputs.
Read definition →Workflow & practice
How people actually work with agents day to day, from vibe coding to review discipline.
Automated check
An automated check is a machine-verifiable gate that agent output has to pass, like tests, a type check, a linter, or a build. It either passes or fails with no judgment, which makes it the backbone of trusting agent code, especially when you are running unattended.
Read definition →Automated review
Automated review is putting a change through an AI reviewer before a person sees it, so a second agent flags likely bugs, missed edge cases, and smells. It catches the obvious cheaply, but it does not replace human judgment about whether the change is right.
Read definition →AX
AX, agent experience, is how well a codebase or tool is set up for AI agents to work inside it: clear structure, written-down conventions, an AGENTS.md, and automated checks the agent can verify against. It is the emerging sibling of developer experience.
Read definition →Design doc
A design doc is a short written description of how you plan to build something, written before you build it. It forces you and the agent to commit to an approach and surfaces problems while they are still cheap to fix.
Read definition →DX
DX, developer experience, is how good it feels for a human to work with a tool, library, or codebase: fast feedback, clear errors, sensible defaults, docs that answer the real question. It still matters in the agent era, and it tends to track how well agents work in the same codebase.
Read definition →Human review
Human review is a person actually reading what an agent produced, understanding it, and taking responsibility for shipping it. It is the final quality gate that tests and automated review can support but never replace.
Read definition →Prototyping
Prototyping is using an agent to throw together a rough, disposable version of something fast, so you can see an idea working and decide what to actually build. You optimise for speed and learning, not polish, and you discard the result freely.
Read definition →Self-critique
Self-critique is asking a model to attack its own output, or having a fresh agent do it, to catch bugs and bad assumptions before you ship. It is a direct counter to a model's tendency to agree with whatever it just produced.
Read definition →Vibe coding
Vibe coding is building software by prompting an agent and steering on feel, accepting the code it writes without reading every line. It is fast and freeing for prototypes and personal tools, and genuinely risky the moment the code has to run in production.
Read definition →