Embeddings - Context Engineering Dictionary

An embedding is a vector: a fixed-length list of numbers that represents the meaning of a piece of text. Texts that mean similar things get vectors that point in similar directions, so you can measure how related two pieces of text are with simple arithmetic. This is the engine under semantic search and RAG.

Meaning as geometry

The trick is that closeness in vector space tracks closeness in meaning. "Cats are great pets" and "feline companions" share almost no words, but their embeddings sit close together. A sentence about tax filing sits far away. You compare vectors with cosine similarity, which returns a number from -1 to 1:

TypeScript

import { embed, cosineSimilarity } from 'ai'
import { openai } from '@ai-sdk/openai'

const model = openai.embedding('text-embedding-3-small')
const a = await embed({ model, value: 'cats are great pets' })
const b = await embed({ model, value: 'feline companions' })
const c = await embed({ model, value: 'quarterly tax filing' })

cosineSimilarity(a.embedding, b.embedding) // ~0.59 (close in meaning)
cosineSimilarity(a.embedding, c.embedding) // ~0.11 (unrelated)

Same idea, different words scores high; different idea scores low. That is the whole basis of retrieving by meaning.

Why it matters

Keyword search misses paraphrases. Embeddings find the passage that answers the question even when it shares no words with it.
You embed once, search many times. Precompute the vectors for your corpus, then each query is a fast similarity comparison.
The model choice matters. Different embedding models capture different nuances and have different vector sizes; pick one and use it consistently for the corpus and the queries.

Tip

Embeddings are only as good as what you feed them. Embedding a whole document as one vector blurs its meaning; that is why you usually chunk first and embed the pieces.

Related terms

Retrieval-augmented generation (RAG)

RAG is the workhorse pattern of context engineering: retrieve the material relevant to a request, put it in the context, and let the model generate an answer grounded in it rather than guessing from memory.

Read definition →

Chunking

Chunking is splitting a long document into smaller pieces before you embed and retrieve them. The size and overlap of the chunks decide what can be found as a unit, so it quietly makes or breaks a retrieval system.