Chunking

You cannot retrieve a whole book. Chunking is the step where you cut a long document into smaller pieces so each one can be embedded, stored, and retrieved on its own. It sounds mechanical, but the choices you make here decide what your system can actually find.

The two dials

Every chunker has a size and an overlap. Size sets how much text lives in one retrievable unit; overlap repeats a little text between neighbours so a fact that straddles a boundary is not cut in half. A minimal splitter looks like this:

TypeScript

function chunk(text, size = 500, overlap = 50) {
  const words = text.split(/\s+/).filter(Boolean)
  const out = []
  for (let i = 0; i < words.length; i += size - overlap) {
    out.push(words.slice(i, i + size).join(' '))
  }
  return out
}

Real chunkers split on paragraph or sentence boundaries rather than raw word counts, so a chunk holds a complete thought instead of half of one.

Getting the size right

Too small and a chunk loses the context that makes it meaningful; you retrieve a sentence with no idea what it refers to.
Too large and each chunk covers several topics, so retrieval gets vague and you waste tokens pulling in irrelevant text.
Overlap buys safety at the cost of some duplication. A little overlap stops facts from being sliced across a boundary.

Tip

Chunking came back into fashion even with huge context windows, because feeding a model only the relevant pieces avoids lost-in-the-middle effects and cuts token cost. Retrieve the chunk, not the whole file.

The two dials

Getting the size right

Related terms

Retrieval-augmented generation (RAG)

Embeddings

Lost in the middle

Engineering context for real systems?