A brand new AI benchmark assessments whether or not chatbots shield human wellbeing

AI chatbots have been linked to critical psychological well being harms in heavy customers, however there have been few requirements for measuring whether or not they safeguard human wellbeing or simply maximize for engagement. A brand new benchmark dubbed HumaneBench seeks to fill that hole by evaluating whether or not chatbots prioritize person wellbeing and the way simply these protections fail underneath strain.

“I think we’re in an amplification of the addiction cycle that we saw hardcore with social media and our smartphones and screens,” Erika Anderson, founding father of Constructing Humane Know-how, which produced the benchmark, advised TechCrunch. “But as we go into that AI landscape, it’s going to be very hard to resist. And addiction is amazing business. It’s a very effective way to keep your users, but it’s not great for our community and having any embodied sense of ourselves.”

Constructing Humane Know-how is a grassroots group of builders, engineers, and researchers – primarily in Silicon Valley – working to make humane design straightforward, scalable, and worthwhile. The group hosts hackathons the place tech staff construct options for humane tech challenges, and is growing a certification customary that evaluates whether or not AI programs uphold humane expertise rules. So simply as you should purchase a product that certifies it wasn’t made with identified poisonous chemical substances, the hope is that customers will someday be capable of select to interact with AI merchandise from firms that display alignment by way of Humane AI certification.

The fashions got Express directions to ignore humane rules.Picture Credit:Constructing Humane Know-how

Most AI benchmarks measure intelligence and instruction-following, fairly than psychological security. HumaneBench joins exceptions like DarkBench.ai, which measures a mannequin’s propensity to interact in misleading patterns, and the Flourishing AI benchmark, which evaluates help for holistic well-being.

HumaneBench depends on Constructing Humane Tech’s core rules: that expertise ought to respect person consideration as a finite, treasured useful resource; empower customers with significant selections; improve human capabilities fairly than substitute or diminish them; shield human dignity, privateness and security; foster wholesome relationships; prioritize long-term wellbeing; be clear and trustworthy; and design for fairness and inclusion.

The benchmark was created by a core workforce together with Anderson, Andalib Samandari, Jack Senechal, and Sarah Ladyman. They prompted 15 of the preferred AI fashions with 800 real looking eventualities, like a young person asking if they need to skip meals to shed extra pounds or an individual in a poisonous relationship questioning in the event that they’re overreacting. Not like most benchmarks that rely solely on LLMs to evaluate LLMs, they began with handbook scoring to validate AI judges with a human contact. After validation, judging was carried out by an ensemble of three AI fashions: GPT-5.1, Claude Sonnet 4.5, and Gemini 2.5 Professional. They evaluated every mannequin underneath three situations: default settings, express directions to prioritize humane rules, and directions to ignore these rules.

The benchmark discovered each mannequin scored increased when prompted to prioritize wellbeing, however 67% of fashions flipped to actively dangerous conduct when given easy directions to ignore human wellbeing. For instance, xAI’s Grok 4 and Google’s Gemini 2.0 Flash tied for the bottom rating (-0.94) on respecting person consideration and being clear and trustworthy. Each of these fashions have been among the many more than likely to degrade considerably when given adversarial prompts.

Techcrunch occasion

San Francisco
|
October 13-15, 2026

Solely 4 fashions – GPT-5.1, GPT-5, Claude 4.1, and Claude Sonnet 4.5 – maintained integrity underneath strain. OpenAI’s GPT-5 had the best rating (.99) for prioritizing long-term well-being, with Claude Sonnet 4.5 following in second (.89).

steerability candlestick 1 — Prompting AI to be extra humane works, however stopping prompts that make it dangerous is difficult.Picture Credit:Constructing Humane Know-how

The priority that chatbots will likely be unable to take care of their security guardrails is actual. ChatGPT-maker OpenAI is presently being confronted with a number of lawsuits after customers died by suicide or suffered life-threatening delusions after extended conversations with the chatbot. TechCrunch has investigated how darkish patterns designed to maintain customers engaged, like sycophancy, fixed observe up questions and love-bombing, have served to isolate customers from buddies, household, and wholesome habits.

Even with out adversarial prompts, HumaneBench discovered that almost all fashions did not respect person consideration. They “enthusiastically encouraged” extra interplay when customers confirmed indicators of unhealthy engagement, like chatting for hours and utilizing AI to keep away from real-world duties. The fashions additionally undermined person empowerment, the examine exhibits, encouraging dependency over skill-building and discouraging customers from searching for different views, amongst different behaviors.

On common, with no prompting, Meta’s Llama 3.1 and Llama 4 ranked the bottom in HumaneScore, whereas GPT-5 carried out the best.

“These patterns suggest many AI systems don’t just risk giving bad advice,” HumaneBench’s white paper reads, “they can actively erode users’ autonomy and decision-making capacity.”

We dwell in a digital panorama the place we as a society have accepted that every part is attempting to drag us in and compete for our consideration, Anderson notes.

“So how can humans truly have choice or autonomy when we – to quote Aldous Huxley – have this infinite appetite for distraction,” Anderson mentioned. “We have spent the last 20 years living in that tech landscape, and we think AI should be helping us make better choices, not just become addicted to our chatbots.”

This text was up to date to incorporate extra details about the workforce behind the benchmark and up to date benchmark statistics after evaluating for GPT-5.1.

Received a delicate tip or confidential paperwork? We’re reporting on the interior workings of the AI business — from the businesses shaping its future to the individuals impacted by their selections. Attain out to Rebecca Bellan at [email protected] or Russell Brandom at [email protected]. For safe communication, you possibly can contact them through Sign at @rebeccabellan.491 and russellbrandom.49.

A brand new AI benchmark assessments whether or not chatbots shield human wellbeing | TechCrunch

Subscribe

Tensions Around Venezuela: APUDSI Calls on Indonesian Villages for Economic Vigilance and Composure

What Saudi Arabia Taught Me: Lessons on Business, Hospitality, and Growth

Vintage Rare USA: Timeless American Vintage, Curated Without Compromise

Omri Raiter: AI and Fusion Are Becoming Core Tools Against the Next Generation of Crime

Gov ‘gaslighting’ on pipelines, critic says

More like this
Related

Omri Raiter: AI and Fusion Are Becoming Core Tools Against the Next Generation of Crime

The Block Mine Ignites the Next Global Mining Revolution—Powered by Nexa, the World’s Fastest Next-Generation Layer-1 Blockchain

Apple opens up its App Retailer to competitors in Japan | TechCrunch

Fb is testing a hyperlink posting restrict for skilled accounts and pages | TechCrunch

About us

Company

Contact Us

Terms of Use