Using a closed inference model under experimentation in my department, I scored 35 out of 42 points in IMO 2025 and have struggled with AI for a long time given the same time as humans…10 to 30% accuracy for other models
OpenAI’s artificial intelligence (AI) model has achieved a monumental achievement in catching up with human intelligence at the International Mathematical Olympiad.
In other words, OpenAI has shown its robustness with overwhelming AI performance at a time when rumors of a crisis are emerging due to the failure to acquire start-up Windsurf and the subsequent outflow of talent.
“We achieved gold medal-level performance at the International Mathematical Olympiad (IMO 2025) this year with a universal inference model,” OpenAI CEO Sam Altman said on his X(X) on the 19th (local time).
“When I first started OpenAI, it was a dream story. This is an important indicator of how advanced AI has developed over the past decade, he said, explaining the significance of this achievement.
The results are based on the Large Language Model for Inference (LLM), which is being experimented internally by a small team led by Alexander Way, a research scientist at OpenAI.
IMO is a prestigious Olympiad that has been underway since 1959, and students under the age of 20 representing each country participate. It is characterized by requiring mathematical thinking and creative ideas, not just problems that can be solved by memorizing formulas.
According to OpenAI, the test was conducted under the same conditions as human candidates. IMO, which consists of a total of 6 questions, is a method of solving 3 questions for 4 hours and 30 minutes a day over 2 days.
OpenAI’s model scored 35 points out of 42 while solving 5 out of 6 questions, setting a record equivalent to the gold medal spot.
In this year’s IMO, humans still performed better than AI with six perfect scores, but it is evaluated as a symbolic event that shows how close LLM’s rapidly developing performance is to human intelligence.
LLMs so far have hardly reached the silver and bronze medals in IMO, let alone the gold medal.
Google DeepMind’s “AlphaProof” and “AlphaGeometry 2” scored silver medals last year, but these models were specialized only in the math field.
Noam Brown, an open AI research scientist, said, “The achievements that AI has previously shown in Go and poker are the result of researchers training AI to master only that specific area for years. However, this model is not an IMO-specific model, but an inference LLM that combines a new method of experimental general-purpose technology.”
According to MathArena of the Federal Institute of Technology (ETH Zurich), Switzerland, which tracks the mathematical performance of major models, all of the world’s leading models such as Google’s Gemini 2.5 Pro, xAI’s Grok 4 and DeepSeek’s R1 did not even make it to the bronze medal list at this year’s IMO 2025.
The specific secret of OpenAI’s breakthrough achievement has not been disclosed.
“We have developed a new technology that enables LLM to perform much better tasks that are difficult to verify,” Brown said. “O1 (existing inference model) thinks for seconds, and ‘deep research’ functions take minutes, but this model thinks for hours.”
Meanwhile, this result is not a third-party verification state as it is conducted as a closed-door experimental model that has not been officially released by OpenAI. Mass Arena said in response, “We are excited to see steep progress in this area and look forward to the launch of the model, enabling independent evaluation through public benchmarks.”
You must be logged in to post a comment Login