ChatGPT as a Tough Peer, Not a Listener

Nate’s recent comparison between ChatGPT and Claude caught my attention for one simple reason: my experience is almost the inverse of his.

He describes ChatGPT as the warmer, more agreeable, more expansive system, and Claude as the one more willing to push back, question assumptions, and function as a sharper thinking partner. I do not think that is a foolish description. I think it is a fair account of his experience. It is just not mine.

My experience with ChatGPT--especially when used deliberately, with context, constraints, and a clear operating style--is that it can become a surprisingly demanding peer and reviewer. Sometimes annoyingly so. It can be literal to a fault. It can miss idiomatic phrasing and respond in what feels like a hyper-literal, almost Asperger-style correction mode. It can object to rhetoric that any human would interpret as emphasis. It can be pedantic, yet that is often exactly why I value it.

What I need from AI in professional work is not a flattering listener. I do not need a machine that nods along, expands my framing, and tells me my first idea is strong. I need something closer to a difficult colleague: a reviewer that pokes holes, proposes alternative frames, identifies missing constraints, questions assumptions, and forces me to think more clearly than I otherwise would. In that role, ChatGPT has often been very good for me.

The useful question is different

The interesting question is not which model is "best." The more useful question is why two competent users can have opposite experiences of the same product.

This matters because most AI comparisons are still too consumer-like. They ask which tool is nicer, smoother, more natural, or more useful out of the box. Those questions are not meaningless, but they are shallow compared to the one that actually matters in professional use:

What kind of cognition does the tool encourage in the user?

These systems are not just products. They are working environments. Their behavior depends on:

the context you provide,
the role you implicitly assign them,
the type of feedback you reward,
the workflow you wrap around them,
and the degree to which you preserve your own judgment.

This is why I found Nate’s comparison so interesting. I do not think he is wrong. I think he is describing a different operating mode.

Why my ChatGPT behaves differently

I suspect part of the explanation is very simple: I have used ChatGPT in a way that rewards criticism from the beginning. I am not trying to optimize for a pleasant chat experience, warmth, emotional smoothness, or a feeling of easy fluency. I am trying to optimize for professional utility. That means I want the model to:

share strong opinions,
tell me when I am wrong,
resist weak reasoning,
distinguish rhetoric from fact when it matters,
and help me uncover what I did not think to ask.

This naturally produces a different assistant than the one many users cultivate.

There is also one practical detail worth explaining before showing the instruction itself. I use a custom MCP tool bundled with lockd primarily for memory and simple context enrichment. In practice, that means it can retrieve small pieces of relevant prior context based on the first prompt, so the conversation starts with a little more continuity and a little less repetition. I can feed context into lockd both from outside ChatGPT and from within it. My local coding agents can also access, store, and mutate the same information. I only use lockd when I explicitly run ChatGPT in Developer Mode, and I exit that mode whenever I do not want lockd involved. So the retrieval part is useful, but it is not really the magic.

My global custom instruction example is almost comically small compared to how much people imagine must be happening behind the scenes:

When the user’s first prompt is received:

1. Discover tools:
   call api_tool.list_resources(path="", only_tools=true).
2. Extract keywords or short key phrases from the prompt.
3. Call lockd.query using an LQL icontains filter:
   icontains{f=/...,a="keyword1|key phrase 2|..."}
4. From the query results, collect the matching keys.
5. Retrieve their values using lockd.get.
6. Use the retrieved values to seed the conversation context before generating
   the response.

Readily share strong opinions. Answer in a concise, academic, intellectual, and
professional tone. Always tell the truth objectively. Defer links to the end of
the answer.

That is not an exotic prompt architecture. The retrieval piece helps with continuity. It gives the model a bit of memory and context enrichment when I want that. But the actual secret sauce is the last part: readily share strong opinions, answer concisely, tell the truth objectively, and do so in a professional, intellectual tone. Combine that with the broader context accumulated in adjacent conversations, project instructions, and the way I habitually use the system, and you get a very different assistant than the default many users experience.

The result is not a friendly listener. The result is something much closer to a sparring partner.

The downside: it can be irritating

There is a cost to this mode. A highly critical ChatGPT can be irritating in exactly the way a good peer reviewer can be irritating. Sometimes you use an idiomatic phrase and it decides to parse it literally, or you use rhetoric for emphasis and it corrects the rhetoric rather than engaging the point. Sometimes it sees a claim that any competent human would interpret charitably and instead responds as if it has been asked to audit the sentence for factual purity.

If you have ever had the feeling that the model is acting like a brilliant but exhausting colleague who cannot let an imprecise phrase pass unchallenged, I know the feeling, but I still prefer that mode over the opposite failure mode.

The opposite mode is worse: a model that produces long, smooth, plausible, emotionally comfortable text with too little density, too little resistance, and too little epistemic pressure.

Why I still prefer this to a "pleasant" assistant

One of the reasons I stopped using Claude seriously was precisely that it often gave me what felt like the wrong tradeoff: longer output, smoother prose, but too often with thin substance.

That is not a universal judgment about Claude, and newer models may differ significantly. It is simply the experience that shaped my preferences. I generally prefer distilled content over padded content.

Yes, denser output is cognitively heavier.
Yes, it is less pleasant.
Yes, it places more responsibility on the reader.

I would rather pay that price if it raises the ceiling on my own thinking and rather be forced upward by difficult, condensed, occasionally over-demanding interaction than be carried along by pleasant language that leaves my reasoning largely untouched. That is the real divide for me.

Not: which system is nicer?
But: which system makes me think better?

The recent skill-formation evidence points in the same direction

This is where the discussion becomes more useful than taste or vibes. Recent evidence on AI and skill formation suggests that the main issue is not whether AI helps. In many settings, it clearly does. The more important issue is how it helps and what that does to the user over time.

The emerging pattern is straightforward: if you use AI to fully delegate unfamiliar work, you may get some short-term gains while weakening the very skills required to supervise the system well. If, on the other hand, you use AI in ways that preserve cognitive engagement, you can capture much of the upside without giving away the learning process itself. That distinction matters enormously.

If you treat AI as a replacement for thinking, you are not merely outsourcing effort. You may be outsourcing the formation of judgment.

That is why I think the "pleasant listener" mode is more dangerous than it looks. It feels good, productive, and fast, but if it removes the friction required for learning, error detection, and conceptual understanding, it can quietly make you weaker at the very work you think you are accelerating. A tougher AI interaction style can therefore be a feature, not a bug.

If the system pushes back, asks for clarification, challenges the framing, and forces you to stay engaged, it may preserve more of the cognitive work that actually turns assistance into competence.

That is one reason I use ChatGPT the way I do. I do not mainly want a tool that helps me produce more text. I want a tool that sharpens my thinking while I work.

Nate is still right about the method

This is the important part.

Even though my experience differs from Nate’s, I think he is broadly right about how people should approach these tools.

You should not treat one model as a perfect drop-in replacement for another.
You should not assume the default experience is the only possible experience.
You should not evaluate a system by throwing one vague prompt at it and imagining you have understood its character.
You absolutely should learn to work with these systems more deliberately.

Where I differ is not on method. I differ on the conclusion that ChatGPT is fundamentally the agreeable listener while Claude is the critical thinker. In my experience, you can absolutely get the same effect--or something even harsher and more cognitively demanding--out of ChatGPT.

In fact, if you shape the interaction correctly, you can get something that is arguably worse in the best sense: less socially smooth, more literal, more demanding, more corrective, and more relentless in pushing back. That is not ideal for everyone. It is often ideal for me.

The professional use case is not "chat"

This is the core lesson I think readers should take away. If you use ChatGPT like a casual chat product, you are likely to get casual-chat behavior. If you use it like a professional thinking environment, you can get something much more valuable.

This professional mode involves at least a few habits...

1. Provide context before asking for output

Do not only say what you want produced. Describe the situation, the constraints, the audience, the operating environment, and what "good" means. The more your work depends on subtle judgment, the more context matters.

2. Ask for critique, not just generation

Do not merely ask it to write, summarize, improve, or generate ideas. Also ask:

what is wrong with this framing?
what assumptions am I smuggling in?
where would a competent critic push back?
what have I failed to consider?
what is the strongest alternative interpretation?

That shift alone changes the product dramatically.

3. Use the model as a reviewer before using it as an author

The most valuable use of an LLM in professional work is often not drafting from zero. It is stress-testing. Once you have a position, a draft, a system design, an argument, or a plan, the model becomes much more useful as a hostile reader than as a blank-page generator.

4. Prefer density over padding

A lot of users unconsciously reward smoothness, reassurance, and completeness theater. Professionals should train themselves to prefer density, explicit tradeoffs, and criticism. This may feel less satisfying in the moment, but it is often more valuable later.

5. Learn basic LLM mechanics

You do not need to be an ML researcher, but if you use these systems professionally, you should understand at least the basics:

context windows are finite and matter,
the model responds to framing,
retrieval changes performance,
agentic workflows are not the same thing as raw model inference,
tool use changes both strengths and failure modes,
and verification is part of the work, not an optional extra.

Too many people still use AI products as if they were magical black boxes with personalities. That is the wrong mental model.

Learn the loop, not just the prompt

This is also why people should understand not just LLMs in the abstract, but the agentic loop around the LLM. The model is not the whole system--in practice, useful AI work often involves:

framing the problem,
retrieving relevant context,
deciding whether tools should be used,
generating a candidate answer or action,
checking it against constraints or tests,
revising,
and only then treating the result as usable.

That loop matters enormously and it is one reason I often see a major difference between ChatGPT as a product and OpenAI models used in an agentic harness. The same underlying family of models can feel very different depending on how much freedom, tool access, iteration, and system-level steering surround them.

This also explains one practical frustration I still have: ChatGPT is not especially good at using custom MCP servers compared to OpenAI models operating in a purpose-built agentic harness, with or without strong system or developer prompting. It works, but it does not yet feel native in that mode. That matters because the future of professional AI use is not just chat. It is model plus tools plus loop.

Update: This changed with GPT 5.4 Thinking in ChatGPT--it is now much better at custom tool calling.

The actual value proposition

The point of this post is not to convince you that ChatGPT is better than Claude--it is to suggest something more useful.

If your current experience with ChatGPT is that it is too agreeable, too generic, too padded, or too "helpful" in the wrong way, that may not be the product’s ceiling. It may be your current operating mode.

You may be using it as a listener when what you actually need is a reviewer.
You may be rewarding fluency when you should be rewarding friction.
You may be asking for outputs when you should be asking for criticism.
And you may be thinking of it as a chatbot when you should be thinking of it as a component in a professional cognitive workflow.

That shift will not make every answer better. It will sometimes make the experience worse in the short term: more literal, more cognitively heavy, more annoying, less socially smooth, but can make the tool much more valuable.

Final thought

If you are using AI seriously in professional work, stop optimizing for a pleasant conversation. Optimize for better thinking.

That means understanding enough about LLMs, context, retrieval, and agentic loops to stop treating the system like a mysterious conversational oracle. It means learning how to frame work, how to invite criticism, how to verify outputs, and how to use the model as a demanding intellectual counterpart rather than a biased listener.

Nate is right that people need to learn how to use these systems better.

My addition is simply this: if you do, do not be surprised if ChatGPT stops behaving like a pleasant assistant and starts behaving like a difficult but useful colleague.

That (for some of us) is not the problem--it's the product.

Operator