Anthropic used Pokémon to benchmark its latest AI mannequin | TechCrunch

Date:

Anthropic used Pokémon to benchmark its latest AI mannequin. Sure, actually.

In a weblog publish printed Monday, Anthropic mentioned that it examined its newest mannequin, Claude 3.7 Sonnet, on the Sport Boy basic Pokémon Pink. The corporate geared up the mannequin with primary reminiscence, display screen pixel enter, and performance calls to press buttons and navigate across the display screen, permitting it to play Pokémon constantly.

A novel function of Claude 3.7 Sonnet is its potential to interact in “extended thinking.” Like OpenAI’s o3-mini and DeepSeek’s R1, Claude 3.7 Sonnet can “reason” via difficult issues by making use of extra computing — and taking extra time.

That got here in helpful in Pokémon Pink, apparently.

In comparison with a earlier model of Claude, Claude 3.0 Sonnet, which did not go away the home in Pallet City the place the story begins, Claude 3.7 Sonnet efficiently battled three Pokémon gymnasium leaders and gained their badges. 

Picture Credit:Anthropic

Now, it’s not clear how a lot computing was required for Claude 3.7 Sonnet to succeed in these milestones — and the way lengthy every took. Anthropic solely mentioned that the mannequin carried out 35,000 actions to succeed in the final gymnasium chief, Surge.

It certainly gained’t be lengthy earlier than some enterprising developer finds out.

Pokémon Pink is extra of a toy benchmark than something. Nevertheless, there is a protracted historical past of video games getting used for AI benchmarking functions. Previously few months alone, various new apps and platforms have cropped as much as take a look at fashions’ game-playing skills on titles starting from Road Fighter to Pictionary.

Share post:

Subscribe

Latest Article's

More like this
Related

Even Elon Musk forgets that X is not Twitter generally | TechCrunch

Do you generally seek advice from X by its...

Anthropic launches a brand new AI mannequin that ‘thinks’ so long as you need | TechCrunch

Anthropic is releasing a brand new frontier AI mannequin...

Meta AI arrives within the Center East and Africa with assist for Arabic | TechCrunch

Meta has formally expanded Meta AI to the Center...

Yope is sparking GenZ (and VC) curiosity with an Instagram-like app for personal teams | TechCrunch

Photograph and video apps focusing on younger adults with...