Non-determinism

Send a model the exact same prompt twice and you can get two different answers. That is non-determinism, and it is by design, not a bug. During next-token prediction the model produces a probability distribution over possible next tokens, and then it usually samples from that distribution rather than always taking the single most likely token. A setting called temperature controls how much randomness gets mixed in. Higher temperature, more variety; lower, more repetition.

Why providers do this

A little randomness makes output feel less robotic and helps the model escape repetitive ruts. The trade is reproducibility. Even at very low temperature you are not guaranteed identical results, because inference runs on batched hardware where tiny numerical variations creep in.

What it means for coding work

This is easy to forget until it bites you:

A passing run is not proof. An agent solving a task once does not mean it will solve it every time. If reliability matters, run it more than once.
Do not hard-code on exact wording. Tests or scripts that assume the model returns a specific string will be flaky. Assert on behaviour or structure instead.
Bugs can be intermittent. A prompt that fails one time in five is still broken. Chase the pattern, not the single lucky success.

Watch out

Non-determinism compounds in agents. A different early choice sends the whole run down a different path, so two attempts at the same task can diverge completely. Reproducing an agent failure often takes several tries.

Why providers do this

What it means for coding work

Related terms

Inference

Next-token prediction

Effort

Building with AI agents?