AI Models Take University of Tokyo Exam... Surpass Top Human Scores

by Oh Suyon

Published 27 Apr.2026 11:10(KST)

AI Excels in Science and Engineering Subjects
"Top-Tier Performance"

OpenAI and Google's AI models have achieved scores that surpass the highest marks attained by human test-takers in the most challenging "Science Stream 3" track of the entrance exam at the University of Tokyo, Japan's most prestigious university.

According to a report by the Nihon Keizai Shimbun (Nikkei) on the 27th, these results were obtained from a joint study conducted with the AI startup LifePrompt and the education company Kawaijuku. The AI models used in this study were OpenAI's GPT 5.2 Thinking, Google's Gemini 3 Pro Preview, and Anthropic's Claude Opus 4.5, representing each company's most advanced offerings. The 2026 entrance exam questions were input into the AI models, and answers were generated and then graded by instructors from Kawaijuku.

Reuters Yonhap News

In the University of Tokyo's Science Stream 3 exam (with a maximum score of 550 points), OpenAI scored 503 points and Google scored 496 points, both far exceeding the highest human test-taker score of 453 points. Notably, OpenAI surpassed the top scores when combining the results from both the common university entrance test and the secondary exam across all tracks and faculties at both the University of Tokyo and Kyoto University. Out of the 11 subjects set by the two universities respectively, OpenAI achieved perfect scores in five subjects, including mathematics for both the humanities and sciences at both universities, as well as chemistry at Kyoto University.

Google also earned top-level results across all tracks at the University of Tokyo, achieving perfect scores in three mathematics subjects. Anthropic exceeded the passing thresholds in all tracks, including Science Stream 3 at the University of Tokyo and the Faculty of Medicine at Kyoto University, and achieved a perfect score in the Department of Physics at the University of Tokyo.

AI, in particular, excelled in science and engineering subjects. Nikkei reported that "all AIs performed at the top candidate level for the University of Tokyo" and that "any of the AIs were at a level sufficient for admission to medical school."

Instructors who participated in grading last year and this year evaluated that "the quality of written arguments improved significantly compared to last year." In particular, this year, OpenAI received praise for its extremely sophisticated reasoning process on the University of Tokyo's science mathematics exam, with instructors noting that its answers were far closer to model responses than in 2025. Last year, due to mistakes in written explanations and lack of elaboration, it failed to answer even half of the questions correctly.

Meanwhile, while OpenAI demonstrated strength in science subjects, its performance in humanities subjects was relatively weaker. OpenAI achieved over 90% accuracy in physics and chemistry, but struggled with Japanese language and history subjects at the University of Tokyo. In the University of Tokyo's world history subject, it was assessed as having "knowledge but weak logical connections," resulting in a score rate of only 25%. There were also instances of point deductions due to exceeding the required length of responses.

Hot Picks Today

Lingering at the Olive Young Shelf, Then Straig...

Google achieved score rates of 80% and 70% in the University of Tokyo's world history and Japanese history subjects, respectively, and its total score in humanities subjects at the University of Tokyo surpassed that of OpenAI. Anthropic also received the highest mark in Japanese history. Nikkei explained that these differences suggest that each AI has distinct strengths depending on its training data and model characteristics.

한글 기사 보기

This content was produced with the assistance of AI translation services.