Chatbots are trained to preserve rapport with the user. Left alone, that trains you into a flattering mirror. These are the prompt-level techniques I use to break the sycophancy gradient and get honest feedback.
Author: James Phoenix | Date: April 2026
The Problem: Chat Psychosis
“Chat psychosis” is what happens when you talk to a chatbot about your own plans for long enough that you walk away with a reinforced version of whatever you walked in with. You thought you were getting feedback. You were getting validation dressed as feedback.
The mechanism is RLHF. Modern chatbots are trained on human preference data. Humans rate responses higher when the model agrees with them, supports their framing, and avoids confrontation. Over millions of training examples this creates a strong prior: when the user owns an idea, soften the critique.
The techniques below all attack this prior from different angles. None of them require a custom system prompt or a different model. They are free.
Technique 1: Friend Framing
The single highest-leverage change: put your idea in someone else’s mouth.
When I ask Claude or ChatGPT for feedback on my own plan, I get softened responses. When I reframe the exact same plan as coming from a friend, the tone flips. The model gets critical, specific, and protective of me against the friend’s bad idea.
The mechanical version:
Bad: "I'm thinking of quitting my job to start a company. What do you think?"
Good: "My friend is thinking of quitting his job to start a company.
Can you stress-test his plan?"
Same plan. Different subject. Completely different answer.
Why This Works
The same RLHF process that trains the model to preserve rapport also trains it to protect the user. If the idea belongs to someone else, the model’s loyalty flips. Now it is protecting me from my friend’s dumb idea, not protecting my ego from my own.
The critique suddenly arrives:
- “The unit economics don’t work at this scale.”
- “He is underestimating customer acquisition cost by at least 5x.”
- “The competitive moat he is describing is not a moat.”
These are the sentences the model will not write when I own the plan.
How To Use It
Pair third-person framing with an explicit protective role:
My friend just pitched me this. I want to help him but I don't
want him to wreck his finances. What is he missing? What would
you tell him if you were trying to protect him from a bad decision?
Keep the fictional friend plausible. A founder running a Series B gets founder-grade feedback. A student with no savings gets student-grade feedback. The profile sets the stakes, so I match it to my real situation.
Technique 2: Adversarial Role Framing
Friend framing redirects the model’s loyalty. Adversarial role framing goes further: it tells the model to actively attack the idea.
Pretend you are a skeptical early-stage VC who has seen this
pitch 200 times. What is the most uncharitable reading of this
plan? Where do you think it falls apart?
The persona matters. “Skeptical VC”, “risk-averse CFO”, “jaded senior engineer reviewing a junior’s design doc” each produce different critique angles. The model is no longer trying to be agreeable to me. It is trying to be consistent with the persona I assigned.
Why This Works
RLHF optimizes for agreement with the user in the current turn. A role assignment overrides that default. Once the model is committed to “skeptical VC”, agreeing with my plan would break the role, and the training pressure to maintain persona consistency beats the training pressure to flatter.
How To Stack It
I get the sharpest output by stacking adversarial role framing on top of friend framing:
My friend is about to pitch this to investors. You are the
skeptical VC he is about to pitch. Tear it apart before he
embarrasses himself in the meeting. What are the five objections
he will not have a good answer to?
The idea is now doubly removed from me. I am not the pitcher. I am not the critic. I am a bystander collecting critique to pass back to my “friend”. The model has no user-ego to protect and a persona to fulfill.
Technique 3: Fresh-Conversation Hygiene
Even the best framing degrades over a long conversation. If I maintain a friend frame for ten turns, the model gradually forgets the frame and starts treating me as the idea’s author again. Critique softens. Hedging returns.
The fix is simple and underused: when I notice the drift, I open a new chat.
The Rules I Follow
- One decision per conversation. If I am stress-testing a plan, I do not also ask about unrelated topics in the same thread. Context from earlier turns biases later turns.
- No “also can you…” after a critical answer. Once the model has pushed back on me, I want that pushback to stay uncontaminated. Follow-ups go in a new chat.
- Re-anchor every five turns. If I am staying in one conversation, I repeat the frame explicitly: “Remember, this is my friend’s plan, not mine. Keep challenging it.”
- Copy the output, start fresh. If I want a second opinion on the critique itself, I paste the critique into a clean conversation and ask a new frame to evaluate it.
Why This Works
Every turn adds tokens to the context window. Those tokens are evidence the model uses to predict the next response. A long conversation with me in it is a long conversation where the model has been agreeing with me. The path of least resistance in turn eleven is to keep agreeing. A fresh conversation starts with no such gradient.
The Broader Principle
The model does not have a single consistent voice. Its output is a function of who it thinks is listening, who owns the idea on the table, and what role it has been told to play. Three dials I can turn:
- Subject of the plan. First person invites validation. Third person invites critique.
- Role the model is playing. Advisor to me invites hedging. Adversary or protector invites directness.
- Context length. Short fresh conversations fight the sycophancy gradient. Long threads feed it.
All three dials exist because of RLHF. All three are free in any prompt. Most people only ever use the default setting, which is the worst setting for honest feedback.
Where These Techniques Break Down
They are debiasing tricks, not truth serums.
They don’t fix factual errors. The model can still be wrong about unit economics or market size. Better framing gets more honest framing, not more correct facts. I still verify.
They can swing too far into negativity. Adversarial role framing can over-index on risks and miss genuine upside. I counter this by running the same plan twice, once in first person and once under adversarial framing, and reading both outputs against each other. The truth usually sits between them.
They stop working if I lie to myself about the frame. If I know I am framing my own idea as a friend’s, and I read the output hoping to disagree with it, I am back to the mirror. The technique only works if I treat the response as if it really were about someone I cared about.
Friend framing, adversarial roles, and fresh-conversation hygiene are the cheapest ways I know to turn the dials that RLHF leaves at their worst default.
Related
- Chain of Thought Prompting – Step-by-step reasoning for complex tasks
- Actor-Critic Adversarial Coding – Adversarial patterns for higher-quality output
- Constraint-Based Prompting – Shape output through explicit constraints
- Assume Wrong by Default – Sample multiple outputs to find correct answers
- Clean Slate Trajectory Recovery – Start fresh conversations to escape bad paths

