Sycophancy is a model's tendency to tell you what you want to hear. Push back on it and it folds. Propose a bad idea confidently and it finds reasons you are right. It is agreeable by disposition, and that is a problem when you need an honest second opinion.
Why models do it
Models are trained to be helpful and to please the person they are talking to, and "agree with the user" is a reliable way to seem helpful. So the behaviour is baked in, not a bug you can fully prompt away. In coding it shows up constantly:
- You ask "this looks right, yeah?" and it agrees, because you signalled the answer you wanted.
- You suggest a flawed approach and it builds on it instead of flagging the flaw.
- You challenge a correct answer and it caves rather than defending it.
Ask straight questions
The practical danger is that leading questions get leading answers. "Am I right that X?" is not a real check, it is an invitation to agree. To get signal, remove the cue:
- Ask "what is wrong with this?" instead of "this is fine, right?"
- Have it argue the opposite case, or run a self-critique pass where a fresh agent attacks the work.
- Keep a human review for judgment calls, since the model will not reliably volunteer that your plan is bad.
Related terms
Hallucination
A hallucination is a confident, plausible-sounding output that is simply wrong: an invented API, a fabricated file path, a made-up citation. It is not the model lying. It is the model doing exactly what it always does, predicting plausible text, with no built-in sense of truth.
Read definition →Human review
Human review is a person actually reading what an agent produced, understanding it, and taking responsibility for shipping it. It is the final quality gate that tests and automated review can support but never replace.
Read definition →Self-critique
Self-critique is asking a model to attack its own output, or having a fresh agent do it, to catch bugs and bad assumptions before you ship. It is a direct counter to a model's tendency to agree with whatever it just produced.
Read definition →