Effort

Newer models let you set an effort level: roughly, how much thinking the model does at inference before it commits to an answer. Low effort answers fast and cheap. High effort lets the model work through a problem internally, often producing a chain of reasoning, before it writes the reply you see. You may also see this called reasoning effort or just "thinking."

What you are actually buying

Under the hood, more effort means the model generates more tokens while reasoning, even if a lot of that reasoning is hidden from the final answer. That has direct consequences:

Latency goes up. More internal work means you wait longer for the first useful output.
Cost goes up. Reasoning tokens are output tokens, and you pay for them.
Quality goes up on hard tasks, not easy ones. On a tricky bug spanning several files, high effort earns its keep. On a rename or a one-line change, it is wasted money and time.

Match the dial to the task

The skill is choosing deliberately rather than leaving it maxed out. Reach for high effort when the problem is genuinely hard: subtle logic, tangled dependencies, tricky planning. Drop to low effort for the routine bulk of coding work, where a fast model is not just cheaper but keeps you in flow.

Tip

Effort is not a fix for a vague request. A model reasoning hard over the wrong context just produces a well-argued wrong answer. Get the inputs right first, then spend effort on the parts that deserve it.

What you are actually buying

Match the dial to the task

Related terms

Inference

Non-determinism

Output tokens

Building with AI agents?