Google’s SIMA 2 agent makes use of Gemini to motive and act in digital worlds | TechCrunch

Date:

Google DeepMind shared on Thursday a analysis preview of SIMA 2, the subsequent era of its generalist AI agent that integrates the language and reasoning powers of Gemini, Google’s massive language mannequin, to maneuver past merely following directions to understanding and interacting with its setting. 

Like a lot of DeepMind’s initiatives, together with AlphaFold, the first model of SIMA was skilled on a whole lot of hours of online game information to learn to play a number of 3D video games like a human, even some video games it wasn’t skilled on. SIMA 1, unveiled in March 2024, may observe primary directions throughout a variety of digital environments, however it solely had a 31% success charge for finishing advanced duties, in comparison with 71% for people.   

“SIMA 2 is a step change and improvement in capabilities over SIMA 1,” Joe Marino, senior analysis scientist at DeepMind, stated in a press briefing. “It’s a more general agent. It can complete complex tasks in previously unseen environments. And it’s a self-improving agent. So it can actually self-improve based on its own experience, which is a step towards more general-purpose robots and AGI systems more generally.”

DeepMind says SIMA 2 doubles the efficiency of SIMA 1Picture Credit:Google DeepMind

SIMA 2 is powered by the Gemini 2.5 flash-lite mannequin, and AGI refers to synthetic basic intelligence, which DeepMind defines as a system able to a variety of mental duties with the power to be taught new expertise and generalize data throughout totally different areas. 

Working with so-called “embodied agents” is essential to generalized intelligence, DeepMind’s researchers say. Marino defined that an embodied agent interacts with a bodily or digital world through a physique — observing inputs and taking actions very similar to a robotic or human would — whereas a non-embodied agent would possibly work together together with your calendar, take notes, or execute code. 

Jane Wang, a senior employees analysis scientist at DeepMind with a background in neuroscience, advised TechCrunch that SIMA 2 goes far past gameplay. 

“We’re asking it to actually understand what’s happening, understand what the user is asking it to do, and then be able to respond in a common-sense way that’s actually quite difficult,” Wang stated. 

Techcrunch occasion

San Francisco
|
October 13-15, 2026

By integrating Gemini, SIMA 2 doubled its predecessor’s efficiency, uniting Gemini’s superior language and reasoning skills with the embodied expertise developed by means of coaching.

deepmind
Picture Credit:Google DeepMind

Marino demoed SIMA 2 in “No Man’s Sky,” the place the agent described its environment — a rocky planet floor — and decided its subsequent steps by recognizing and interacting with a misery beacon. SIMA 2 additionally makes use of Gemini to motive internally. In one other sport, when requested to stroll to the home that’s the colour of a ripe tomato, the agent confirmed its considering — ripe tomatoes are purple, due to this fact I ought to go to the purple home — then discovered and approached it.

Being Gemini-powered additionally means SIMA 2 follows directions primarily based on emojis: “You instruct it 🪓🌲, and it’ll go chop down a tree,” Marino stated. 

Marino additionally demonstrated how SIMA 2 can navigate newly generated photorealistic worlds produced by Genie, DeepMind’s world mannequin, appropriately figuring out and interacting with objects like benches, timber, and butterflies. 

SIMA 2 blog figure 3
DeepMind says SIMA 2 is a self-improving agentPicture Credit:Google DeepMind

Gemini additionally allows self-improvement with out a lot human information, Marino added. The place SIMA 1 was skilled solely on human gameplay, SIMA 2 makes use of it as a baseline to offer a powerful preliminary mannequin. When the crew places the agent into a brand new setting, it asks one other Gemini mannequin to create new duties and a separate reward mannequin to attain the agent’s makes an attempt. Utilizing these self-generated experiences as coaching information, the agent learns from its personal errors and progressively performs higher, primarily educating itself new behaviors by means of trial and error as a human would, guided by AI-based suggestions as an alternative of people.

DeepMind sees SIMA 2 as a step towards unlocking extra general-purpose robots.

“If we think of what a system needs to do to perform tasks in the real world, like a robot, I think there are two components of it,” Frederic Besse, senior employees analysis engineer at DeepMind, stated throughout a press briefing. “First, there is a high-level understanding of the real world and what needs to be done, as well as some reasoning.”

In case you ask a humanoid robotic in your own home to go verify what number of cans of beans you could have within the cabinet, the system wants to know the entire totally different ideas — what beans are, what a cabinet is — and navigate to that location. Besse says SIMA 2 touches extra on that high-level habits than it does on lower-level actions, which he refers to as controlling issues like bodily joints and wheels.

The crew declined to share a particular timeline for implementing SIMA 2 in bodily robotics methods. Besse advised TechCrunch that DeepMind’s not too long ago unveiled robotics basis fashions — which may additionally motive concerning the bodily world and create multi-step plans to finish a mission — have been skilled in another way and individually from SIMA. 

Whereas there’s additionally no timeline for releasing greater than a preview of SIMA 2, Wang advised TechCrunch the objective is to indicate the world what DeepMind has been engaged on and see what sorts of collaborations and potential makes use of are doable.

Share post:

Subscribe

Latest Article's

More like this
Related

DoorDash rolls out Zesty, an AI social app for locating new eating places | TechCrunch

DoorDash is launching a brand new AI-powered social app...

Instagram brings Reels to the massive display, beginning with Amazon Fireplace TV | TechCrunch

Instagram is increasing Reels-viewing past cellular, the social community...