Reliability techniques

Self-consistency

Also called: majority voting, self consistency

Self-consistency samples the same prompt several times and takes the majority answer. It trades a few extra calls for a big drop in variance, turning a model that sometimes slips into one that reliably lands on its best answer.

James Phoenix
Understanding Data Updated July 3, 2026

A model is non-deterministic: ask it the same question twice and you can get two different answers. Self-consistency turns that from a bug into a feature. Instead of trusting a single sample, you generate several, then let them vote. The most common answer wins, and the odd unlucky miss gets outvoted.

Why it works

Different runs take different reasoning paths, but correct paths tend to converge on the same answer while wrong ones scatter. Counting the answers surfaces the one the model lands on most often, which is usually the right one. You are spending a handful of extra calls to buy down variance.

The pattern

Sample the prompt N times, normalise the answers, and take the mode:

TypeScript
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'

const model = openai('gpt-5-mini')
const runs = await Promise.all(
  Array.from({ length: 5 }, () =>
    generateText({ model, prompt: 'Is 17 prime? Answer only yes or no.' })
  )
)

const tally = {}
for (const r of runs) {
  const a = r.text.trim().toLowerCase().replace(/[^a-z]/g, '')
  tally[a] = (tally[a] || 0) + 1
}
const answer = Object.entries(tally).sort((a, b) => b[1] - a[1])[0][0]

Five samples, one majority answer. The calls run in parallel, so you pay in tokens, not wall-clock time.

When to use it

  • Best on tasks with a small answer space: classifications, yes/no calls, short extractions where "the same answer" is easy to define.
  • The simplest reliability lever there is. Unlike a full evaluator loop, it needs no reward function, just a vote.
  • It has a ceiling. If the model is wrong most of the time, the majority is wrong too. Voting sharpens a decent model; it cannot rescue a bad one.
Tip
Self-consistency pairs well with an LLM-as-judge: sample several answers, then let a judge pick the best instead of a plain majority vote when the answers are longer than a single label.

Related terms

Engineering context for real systems?

Getting the right information into the window at the right time is most of the job. If you want that thinking applied to your product, that is what I do.

See how I can help