You cannot retrieve a whole book. Chunking is the step where you cut a long document into smaller pieces so each one can be embedded, stored, and retrieved on its own. It sounds mechanical, but the choices you make here decide what your system can actually find.
The two dials
Every chunker has a size and an overlap. Size sets how much text lives in one retrievable unit; overlap repeats a little text between neighbours so a fact that straddles a boundary is not cut in half. A minimal splitter looks like this:
function chunk(text, size = 500, overlap = 50) {
const words = text.split(/\s+/).filter(Boolean)
const out = []
for (let i = 0; i < words.length; i += size - overlap) {
out.push(words.slice(i, i + size).join(' '))
}
return out
}Real chunkers split on paragraph or sentence boundaries rather than raw word counts, so a chunk holds a complete thought instead of half of one.
Getting the size right
- Too small and a chunk loses the context that makes it meaningful; you retrieve a sentence with no idea what it refers to.
- Too large and each chunk covers several topics, so retrieval gets vague and you waste tokens pulling in irrelevant text.
- Overlap buys safety at the cost of some duplication. A little overlap stops facts from being sliced across a boundary.
Related terms
Retrieval-augmented generation (RAG)
RAG is the workhorse pattern of context engineering: retrieve the material relevant to a request, put it in the context, and let the model generate an answer grounded in it rather than guessing from memory.
Read definition →Embeddings
An embedding turns a piece of text into a list of numbers that captures its meaning, so that similar ideas land near each other. Embeddings are what let you search by meaning instead of by exact keyword.
Read definition →Lost in the middle
Lost in the middle is the tendency of models to use information at the start and end of a long context well, while missing what sits in the middle. It means a bigger context window does not automatically mean better recall.
Read definition →