Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he watched his colleagues launch ChatGPT, one of many fastest-growing merchandise ever. In the meantime, Lightman quietly labored on a workforce educating OpenAI’s fashions to unravel highschool math competitions.
At this time that workforce, often known as MathGen, is taken into account instrumental to OpenAI’s industry-leading effort to create AI reasoning fashions: the core know-how behind AI brokers that may do duties on a pc like a human would.
“We were trying to make the models better at mathematical reasoning, which at the time they weren’t very good at,” Lightman advised TechCrunch, describing MathGen’s early work.
OpenAI’s fashions are removed from good at this time — the corporate’s newest AI programs nonetheless hallucinate and its brokers wrestle with advanced duties.
However its state-of-the-art fashions have improved considerably on mathematical reasoning. Certainly one of OpenAI’s fashions just lately gained a gold medal on the Worldwide Math Olympiad, a math competitors for the world’s brightest highschool college students. OpenAI believes these reasoning capabilities will translate to different topics, and finally energy general-purpose brokers that the corporate has all the time dreamed of constructing.
ChatGPT was a cheerful accident — a lowkey analysis preview turned viral shopper enterprise — however OpenAI’s brokers are the product of a years-long, deliberate effort inside the firm.
“Eventually, you’ll just ask the computer for what you need and it’ll do all of these tasks for you,” mentioned OpenAI CEO Sam Altman on the firm’s first developer convention in 2023. “These capabilities are often talked about in the AI field as agents. The upsides of this are going to be tremendous.”
Techcrunch occasion
San Francisco
|
October 27-29, 2025
Whether or not brokers will meet Altman’s imaginative and prescient stays to be seen, however OpenAI shocked the world with the discharge of its first AI reasoning mannequin, o1, within the fall of 2024. Lower than a yr later, the 21 foundational researchers behind that breakthrough are essentially the most extremely sought-after expertise in Silicon Valley.
Mark Zuckerberg recruited 5 of the o1 researchers to work on Meta’s new superintelligence-focused unit, providing some compensation packages north of $100 million. Certainly one of them, Shengjia Zhao, was just lately named chief scientist of Meta Superintelligence Labs.
The reinforcement studying renaissance
The rise of OpenAI’s reasoning fashions and brokers are tied to a machine studying coaching method often known as reinforcement studying (RL). RL offers suggestions to an AI mannequin on whether or not its decisions had been appropriate or not in simulated environments.
RL has been used for many years. For example, in 2016, a few yr after OpenAI was based in 2015, an AI system created by Google DeepMind utilizing RL, AlphaGo, gained world consideration after beating a world champion within the board sport, Go.

Round that point, one in all OpenAI’s first workers, Andrej Karpathy, started pondering the right way to leverage RL to create an AI agent that would use a pc. However it could take years for OpenAI to develop the mandatory fashions and coaching methods.
By 2018, OpenAI pioneered its first giant language mannequin within the GPT collection, pretrained on large quantities of web knowledge and a big clusters of GPUs. GPT fashions excelled at textual content processing, ultimately resulting in ChatGPT, however struggled with fundamental math.
It took till 2023 for OpenAI to realize a breakthrough, initially dubbed “Q*” after which “Strawberry,” by combining LLMs, RL, and a way referred to as test-time computation. The latter gave the fashions additional time and computing energy to plan and work by means of issues, verifying its steps, earlier than offering a solution.
This allowed OpenAI to introduce a brand new method referred to as “chain-of-thought” (CoT), which improved AI’s efficiency on math questions the fashions hadn’t seen earlier than.
“I could see the model starting to reason,” mentioned El Kishky. “It would notice mistakes and backtrack, it would get frustrated. It really felt like reading the thoughts of a person.”
Although individually these methods weren’t novel, OpenAI uniquely mixed them to create Strawberry, which instantly led to the event of o1. OpenAI rapidly recognized that the planning and truth checking talents of AI reasoning fashions may very well be helpful to energy AI brokers.
“We had solved a problem that I had been banging my head against for a couple of years,” mentioned Lightman. “It was one of the most exciting moments of my research career.”
Scaling reasoning
With AI reasoning fashions, OpenAI decided it had two new axes that may permit it to enhance AI fashions: utilizing extra computational energy in the course of the post-training of AI fashions, and giving AI fashions extra time and processing energy whereas answering a query.
“OpenAI, as a company, thinks a lot about not just the way things are, but the way things are going to scale,” mentioned Lightman.
Shortly after the 2023 Strawberry breakthrough, OpenAI spun up an “Agents” workforce led by OpenAI researcher Daniel Selsam to make additional progress on this new paradigm, two sources advised TechCrunch. Though the workforce was referred to as “Agents,” OpenAI didn’t initially differentiate between reasoning fashions and brokers as we consider them at this time. The corporate simply wished to make AI programs able to finishing advanced duties.
Finally, the work of Selsam’s Brokers workforce grew to become half of a bigger mission to develop the o1 reasoning mannequin, with leaders together with OpenAI co-founder Ilya Sutskever, chief analysis officer Mark Chen, and chief scientist Jakub Pachocki.

OpenAI must divert valuable assets — primarily expertise and GPUs — to create o1. All through OpenAI’s historical past, researchers have needed to negotiate with firm leaders to acquire assets; demonstrating breakthroughs was a surefire solution to safe them.
“One of the core components of OpenAI is that everything in research is bottom up,” mentioned Lightman. “When we showed the evidence [for o1], the company was like, ‘This makes sense, let’s push on it.’”
Some former workers say that the startup’s mission to develop AGI was the important thing think about attaining breakthroughs round AI reasoning fashions. By specializing in growing the smartest-possible AI fashions, somewhat than merchandise, OpenAI was in a position to prioritize o1 above different efforts. That sort of enormous funding in concepts wasn’t all the time attainable at competing AI labs.
The choice to strive new coaching strategies proved prescient. By late 2024, a number of main AI labs began seeing diminishing returns on fashions created by means of conventional pretraining scaling. At this time, a lot of the AI area’s momentum comes from advances in reasoning fashions.
What does it imply for an AI to “reason?”
In some ways, the aim of AI analysis is to recreate human intelligence with computer systems. For the reason that launch of o1, ChatGPT’s UX has been stuffed with extra human-sounding options comparable to “thinking” and “reasoning.”
When requested whether or not OpenAI’s fashions had been really reasoning, El Kishky hedged, saying he thinks concerning the idea when it comes to laptop science.
“We’re teaching the model how to efficiently expend compute to get an answer. So if you define it that way, yes, it is reasoning,” mentioned El Kishky.
Lightman takes the method of specializing in the mannequin’s outcomes and never as a lot on the means or their relation to human brains.

“If the model is doing hard things, then it is doing whatever necessary approximation of reasoning it needs in order to do that,” mentioned Lightman. “We can call it reasoning, because it looks like these reasoning traces, but it’s all just a proxy for trying to make AI tools that are really powerful and useful to a lot of people.”
OpenAI’s researchers notice folks could disagree with their nomenclature or definitions of reasoning — and certainly, critics have emerged — however they argue it’s much less necessary than the capabilities of their fashions. Different AI researchers are likely to agree.
Nathan Lambert, an AI researcher with the non-profit AI2, compares AI reasoning modes to airplanes in a weblog put up. Each, he says, are artifical programs impressed by nature — human reasoning and chicken flight, respectively — however they function by means of fully totally different mechanisms. That doesn’t make them any much less helpful, or any much less able to attaining related outcomes.
A gaggle of AI researchers from OpenAI, Anthropic, and Google DeepMind agreed in a current place paper that AI reasoning fashions will not be effectively understood at this time, and extra analysis is required. It could be too early to confidently declare what precisely is occurring inside them.
The subsequent frontier: AI brokers for subjective duties
The AI brokers available on the market at this time work finest for well-defined, verifiable domains comparable to coding. OpenAI’s Codex agent goals to assist software program engineers offload easy coding duties. In the meantime, Anthropic’s fashions have turn into significantly widespread in AI coding instruments like Cursor and Claude Code — these are a few of the first AI brokers that persons are keen to pay up for.
Nonetheless, basic objective AI brokers like OpenAI’s ChatGPT Agent and Perplexity’s Comet wrestle with most of the advanced, subjective duties folks need to automate. When making an attempt to make use of these instruments for on-line buying or discovering a long-term parking spot, I’ve discovered the brokers take longer than I’d like and make foolish errors.
Brokers are, after all, early programs that may undoubtedly enhance. However researchers should first determine the right way to higher prepare the underlying fashions to finish duties which are extra subjective.

“Like many problems in machine learning, it’s a data problem,” mentioned Lightman, when requested concerning the limitations of brokers on subjective duties. “Some of the research I’m really excited about right now is figuring out how to train on less verifiable tasks. We have some leads on how to do these things.”
Noam Brown, an OpenAI researcher who helped create the IMO mannequin and o1, advised TechCrunch that OpenAI has new general-purpose RL methods which permit them to show AI fashions abilities that aren’t simply verified. This was how the corporate constructed the mannequin which achieved a gold medal at IMO, he mentioned.
OpenAI’s IMO mannequin was a more moderen AI system that spawns a number of brokers, which then concurrently discover a number of concepts, after which select the very best reply. Most of these AI fashions are rising in popularity; Google and xAI have just lately launched state-of-the-art fashions utilizing this system.
“I think these models will become more capable at math, and I think they’ll get more capable in other reasoning areas as well,” mentioned Brown. “The progress has been incredibly fast. I don’t see any reason to think it will slow down.”
These methods could assist OpenAI’s fashions turn into extra performant, features that would present up within the firm’s upcoming GPT-5 mannequin. OpenAI hopes to say its dominance over opponents with the launch of GPT-5, ideally providing the finest AI mannequin to energy brokers for builders and customers.
However the firm additionally needs to make its merchandise less complicated to make use of. El Kishky says OpenAI needs to develop AI brokers that intuitively perceive what customers need, with out requiring them to pick out particular settings. He says OpenAI goals to construct AI programs that perceive when to name up sure instruments, and the way lengthy to motive for.
These concepts paint an image of an final model of ChatGPT: an agent that may do something on the web for you, and perceive the way you need it to be achieved. That’s a a lot totally different product than what ChatGPT is at this time, however the firm’s analysis is squarely headed on this path.
Whereas OpenAI undoubtedly led the AI {industry} a couple of years in the past, the corporate now faces a tranche of worthy opponents. The query is now not simply whether or not OpenAI can ship its agentic future, however can the corporate accomplish that earlier than Google, Anthropic, xAI, or Meta beat them to it?