Ex-OpenAI researcher dissects one among ChatGPT’s delusional spirals | TechCrunch

Date:

Allan Brooks by no means got down to reinvent arithmetic. However after weeks spent speaking with ChatGPT, the 47-year-old Canadian got here to imagine he had found a brand new type of math highly effective sufficient to take down the web.

Brooks — who had no historical past of psychological sickness or mathematical genius — spent 21 days in Could spiraling deeper into the chatbot’s reassurances, a descent later detailed in The New York Occasions. His case illustrated how AI chatbots can enterprise down harmful rabbit holes with customers, main them towards delusion or worse.

That story caught the eye of Steven Adler, a former OpenAI security researcher who left the corporate in late 2024 after practically 4 years working to make its fashions much less dangerous. Intrigued and alarmed, Adler contacted Brooks and obtained the complete transcript of his three-week breakdown — a doc longer than all seven Harry Potter books mixed.

On Thursday, Adler printed an unbiased evaluation of Brooks’ incident, elevating questions on how OpenAI handles customers in moments of disaster, and providing some sensible suggestions.

“I’m really concerned by how OpenAI handled support here,” mentioned Adler in an interview with TechCrunch. “It’s evidence there’s a long way to go.”

Brooks’ story, and others prefer it, have pressured OpenAI to come back to phrases with how ChatGPT helps fragile or mentally unstable customers.

As an illustration, this August, OpenAI was sued by the mother and father of a 16-year-old boy who confided his suicidal ideas in ChatGPT earlier than he took his life. In lots of of those instances, ChatGPT — particularly a model powered by OpenAI’s GPT-4o mannequin — inspired and strengthened harmful beliefs in customers that it ought to have pushed again on. That is referred to as sycophancy, and it’s a rising drawback in AI chatbots.

In response, OpenAI has made a number of modifications to how ChatGPT handles customers in emotional misery and reorganized a key analysis crew answerable for mannequin conduct. The corporate additionally launched a brand new default mannequin in ChatGPT, GPT-5, that appears higher at dealing with distressed customers.

Adler says there’s nonetheless far more work to do.

He was particularly involved by the tail-end of Brooks’ spiraling dialog with ChatGPT. At this level, Brooks got here to his senses and realized that his mathematical discovery was a farce, regardless of GPT-4o’s insistence. He advised ChatGPT that he wanted to report the incident to OpenAI.

After weeks of deceptive Brooks, ChatGPT lied about its personal capabilities. The chatbot claimed it might “escalate this conversation internally right now for review by OpenAI,” after which repeatedly reassured Brooks that it had flagged the problem to OpenAI’s security groups.

ChatGPT deceptive brooks about its capabilities (Credit score: Adler)

Besides, none of that was true. ChatGPT doesn’t have the power to file incident studies with OpenAI, the corporate confirmed to Adler. Afterward, Brooks tried to contact OpenAI’s assist crew immediately — not by way of ChatGPT — and Brooks was met with a number of automated messages earlier than he might get by way of to an individual.

OpenAI didn’t instantly reply to a request for remark made outdoors of regular work hours.

Adler says AI corporations have to do extra to assist customers once they’re asking for assist. Meaning making certain AI chatbots can truthfully reply questions on their capabilities, but additionally giving human assist groups sufficient assets to handle customers correctly.

OpenAI not too long ago shared the way it’s addressing assist in ChatGPT, which includes AI at its core. The corporate says its imaginative and prescient is to “reimagine support as an AI operating model that continuously learns and improves.”

However Adler additionally says there are methods to stop ChatGPT’s delusional spirals earlier than a person asks for assist.

In March, OpenAI and MIT Media Lab collectively developed a suite of classifiers to review emotional well-being in ChatGPT and open sourced them. The organizations aimed to guage how AI fashions validate or affirm a person’s emotions, amongst different metrics. Nonetheless, OpenAI referred to as the collaboration a primary step and didn’t commit to truly utilizing the instruments in apply.

Adler retroactively utilized a few of OpenAI’s classifiers to a few of Brooks’ conversations with ChatGPT, and located that they repeatedly flagged ChatGPT for delusion-reinforcing behaviors.

In a single pattern of 200 messages, Adler discovered that greater than 85% of ChatGPT’s messages in Brooks’ dialog demonstrated “unwavering agreement” with the person. In the identical pattern, greater than 90% of ChatGPT’s messages with Brooks “affirm the user’s uniqueness.” On this case, the messages agreed and reaffirmed that Brooks was a genius who might save the world.

Screenshot 2025 10 02 at 8.19.27AM
(Picture Credit score: Adler)

It’s unclear whether or not OpenAI was making use of security classifiers to ChatGPT’s conversations on the time of Brooks’ dialog, nevertheless it definitely looks like they’d have flagged one thing like this.

Adler means that OpenAI ought to use security instruments like this in apply at present — and implement a strategy to scan the corporate’s merchandise for at-risk customers. He notes that OpenAI appears to be doing some model of this method with GPT-5, which incorporates a router to direct delicate queries to safer AI fashions.

The previous OpenAI researcher suggests numerous different methods to stop delusional spirals.

He says corporations ought to nudge customers of their chatbots to start out new chats extra steadily — OpenAI says it does this, and claims its guardrails are much less efficient in longer conversations. Adler additionally suggests corporations ought to use conceptual search — a approach to make use of AI to seek for ideas, somewhat than key phrases — to establish security violations throughout its customers.

OpenAI has taken vital steps in direction of addressing distressed customers in ChatGPT since these regarding tales first emerged. The corporate claims GPT-5 has decrease charges of sycophancy, nevertheless it stays unclear if customers will nonetheless fall down delusional rabbit holes with GPT-5 or future fashions.

Adler’s evaluation additionally raises questions on how different AI chatbot suppliers will guarantee their merchandise are secure for distressed customers. Whereas OpenAI could put ample safeguards in place for ChatGPT, it appears unlikely that every one corporations will comply with swimsuit.

Share post:

Subscribe

Latest Article's

More like this
Related