Most systems people call "agents" are really workflows, and that is usually fine. The distinction is not marketing, it changes how you build, evaluate, and pay for the thing.
A workflow is a fixed path: you decide the steps in advance and the model fills in each one. Classify a ticket, then route it. Draft copy, then translate it. An agent is different: give a model tools and run it in a loop, and it decides which step comes next based on what it has learned so far. In a workflow you design the path. In an agent you design the environment and the agent finds the path. Anthropic's public write-up, Building Effective Agents, is the clearest taxonomy of these patterns and worth reading alongside this entry.
An agent is a loop with tools
The whole idea fits in a few lines. Give the model a tool, let it call the tool, feed the result back, and keep going until it is done. The Vercel AI SDK runs that loop for you when you pass tools and a stop condition:
import { generateText, tool, stepCountIs } from 'ai'
import { openai } from '@ai-sdk/openai'
import { z } from 'zod'
const getWeather = tool({
description: 'Get the current weather for a city',
inputSchema: z.object({ city: z.string() }),
execute: async ({ city }) => ({ city, tempC: 21 }),
})
const { text, steps } = await generateText({
model: openai('gpt-5-mini'),
tools: { getWeather },
stopWhen: stepCountIs(5),
prompt: 'What is the weather in Lisbon? Use the tool, then answer in one sentence.',
})The model asks for the tool, the SDK runs it, the result comes back, and the model answers. steps records each hop so you can see what it did.
Which one do you need?
- Start with a workflow. If the path is predictable, a fixed pipeline is cheaper, faster, and easier to trust.
- Reach for an agent only when the solution path genuinely cannot be known in advance: open-ended research, multi-tool support, messy edge cases.
- Mind the cost. Every loop step is another model call. An agent that takes five steps costs several times what a two-step workflow costs.
Related terms
Self-consistency
Self-consistency samples the same prompt several times and takes the majority answer. It trades a few extra calls for a big drop in variance, turning a model that sometimes slips into one that reliably lands on its best answer.
Read definition →LLM-as-judge
An LLM-as-judge uses one model call to score the output of another against a rubric. It is how you evaluate fuzzy, open-ended work at scale when there is no single correct answer to match against.
Read definition →Retrieval-augmented generation (RAG)
RAG is the workhorse pattern of context engineering: retrieve the material relevant to a request, put it in the context, and let the model generate an answer grounded in it rather than guessing from memory.
Read definition →