Retrieval & RAG

Embeddings

Also called: vector embeddings

An embedding turns a piece of text into a list of numbers that captures its meaning, so that similar ideas land near each other. Embeddings are what let you search by meaning instead of by exact keyword.

James Phoenix
Understanding Data Updated July 3, 2026

An embedding is a vector: a fixed-length list of numbers that represents the meaning of a piece of text. Texts that mean similar things get vectors that point in similar directions, so you can measure how related two pieces of text are with simple arithmetic. This is the engine under semantic search and RAG.

Meaning as geometry

The trick is that closeness in vector space tracks closeness in meaning. "Cats are great pets" and "feline companions" share almost no words, but their embeddings sit close together. A sentence about tax filing sits far away. You compare vectors with cosine similarity, which returns a number from -1 to 1:

TypeScript
import { embed, cosineSimilarity } from 'ai'
import { openai } from '@ai-sdk/openai'

const model = openai.embedding('text-embedding-3-small')
const a = await embed({ model, value: 'cats are great pets' })
const b = await embed({ model, value: 'feline companions' })
const c = await embed({ model, value: 'quarterly tax filing' })

cosineSimilarity(a.embedding, b.embedding) // ~0.59 (close in meaning)
cosineSimilarity(a.embedding, c.embedding) // ~0.11 (unrelated)

Same idea, different words scores high; different idea scores low. That is the whole basis of retrieving by meaning.

Why it matters

  • Keyword search misses paraphrases. Embeddings find the passage that answers the question even when it shares no words with it.
  • You embed once, search many times. Precompute the vectors for your corpus, then each query is a fast similarity comparison.
  • The model choice matters. Different embedding models capture different nuances and have different vector sizes; pick one and use it consistently for the corpus and the queries.
Tip
Embeddings are only as good as what you feed them. Embedding a whole document as one vector blurs its meaning; that is why you usually chunk first and embed the pieces.

Related terms

Engineering context for real systems?

Getting the right information into the window at the right time is most of the job. If you want that thinking applied to your product, that is what I do.

See how I can help