A model is non-deterministic: ask it the same question twice and you can get two different answers. Self-consistency turns that from a bug into a feature. Instead of trusting a single sample, you generate several, then let them vote. The most common answer wins, and the odd unlucky miss gets outvoted.
Why it works
Different runs take different reasoning paths, but correct paths tend to converge on the same answer while wrong ones scatter. Counting the answers surfaces the one the model lands on most often, which is usually the right one. You are spending a handful of extra calls to buy down variance.
The pattern
Sample the prompt N times, normalise the answers, and take the mode:
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'
const model = openai('gpt-5-mini')
const runs = await Promise.all(
Array.from({ length: 5 }, () =>
generateText({ model, prompt: 'Is 17 prime? Answer only yes or no.' })
)
)
const tally = {}
for (const r of runs) {
const a = r.text.trim().toLowerCase().replace(/[^a-z]/g, '')
tally[a] = (tally[a] || 0) + 1
}
const answer = Object.entries(tally).sort((a, b) => b[1] - a[1])[0][0]Five samples, one majority answer. The calls run in parallel, so you pay in tokens, not wall-clock time.
When to use it
- Best on tasks with a small answer space: classifications, yes/no calls, short extractions where "the same answer" is easy to define.
- The simplest reliability lever there is. Unlike a full evaluator loop, it needs no reward function, just a vote.
- It has a ceiling. If the model is wrong most of the time, the majority is wrong too. Voting sharpens a decent model; it cannot rescue a bad one.
Related terms
Agents vs. workflows
A workflow follows a path you designed in advance; an agent decides its own path at run time by calling tools in a loop toward a goal. Knowing which one you actually need is the first context-engineering decision.
Read definition →LLM-as-judge
An LLM-as-judge uses one model call to score the output of another against a rubric. It is how you evaluate fuzzy, open-ended work at scale when there is no single correct answer to match against.
Read definition →