A model keeps no memory between calls, so a chat that "remembers" is really you re-sending every prior turn each time. Conversation history is that growing list of messages. Append the model's reply and the next user turn, send the whole thing back, and the model can resolve "it" and "that" to what came before.
It is just a growing array
Each turn adds to the list, and the full list goes on the next request:
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'
const model = openai('gpt-5-mini')
const messages = [{ role: 'user', content: 'My name is James. Remember it.' }]
const first = await generateText({ model, messages })
messages.push({ role: 'assistant', content: first.text })
messages.push({ role: 'user', content: 'What is my name?' })
const second = await generateText({ model, messages })
// second.text -> "You mentioned that your name is James..."Because the earlier turn is still in the messages array, the model can answer the follow-up.
The catch
Conversation history works beautifully for short chats and breaks down for long ones. Fifty turns in, you are spending thousands of tokens re-sending old context before the model even reads the new question, and the useful part gets buried (see lost in the middle). That is where smarter memory strategies come in:
- Summarise the old turns into a compact recap instead of shipping them verbatim.
- Retrieve only the relevant past turns with embeddings, rather than all of them.
- Trim anything the current task does not need.
Related terms
Context engineering
Context engineering is the discipline of deciding what a model sees. Since a model can only work from the text in front of it, the quality of any answer is capped by the quality of the context you assemble.
Read definition →Agents vs. workflows
A workflow follows a path you designed in advance; an agent decides its own path at run time by calling tools in a loop toward a goal. Knowing which one you actually need is the first context-engineering decision.
Read definition →