Foundations

Effort

Also called: reasoning effort, thinking

Effort is a dial for how much internal reasoning a model spends before it answers. Turn it up for genuinely hard problems; you pay for it in latency and extra output tokens.

James Phoenix
Understanding Data Updated July 2, 2026

Newer models let you set an effort level: roughly, how much thinking the model does at inference before it commits to an answer. Low effort answers fast and cheap. High effort lets the model work through a problem internally, often producing a chain of reasoning, before it writes the reply you see. You may also see this called reasoning effort or just "thinking."

What you are actually buying

Under the hood, more effort means the model generates more tokens while reasoning, even if a lot of that reasoning is hidden from the final answer. That has direct consequences:

  • Latency goes up. More internal work means you wait longer for the first useful output.
  • Cost goes up. Reasoning tokens are output tokens, and you pay for them.
  • Quality goes up on hard tasks, not easy ones. On a tricky bug spanning several files, high effort earns its keep. On a rename or a one-line change, it is wasted money and time.

Match the dial to the task

The skill is choosing deliberately rather than leaving it maxed out. Reach for high effort when the problem is genuinely hard: subtle logic, tangled dependencies, tricky planning. Drop to low effort for the routine bulk of coding work, where a fast model is not just cheaper but keeps you in flow.

Tip
Effort is not a fix for a vague request. A model reasoning hard over the wrong context just produces a well-argued wrong answer. Get the inputs right first, then spend effort on the parts that deserve it.

Related terms

Building with AI agents?

This dictionary is part of how I think about agentic engineering. If you want the same thinking applied to your codebase, that is what I do.

See how I can help