OpenAI employees publicly accused Grok3 of misleading benchmark results

PANews reported on February 23 that according to Jinshi, recently, an employee of OpenAI publicly accused Elon Musk's xAI company, saying that the benchmark results of its latest AI model Grok3 were misleading. In response, Igor Babushkin, co-founder of xAI, insisted that the company was not wrong. xAI's chart shows that two versions of Grok3 - Grok3 Reasoning Beta and Grok3 mini Reasoning - outperformed OpenAI's current strongest available model o3-mini-high on AIME 2025. However, OpenAI employees quickly pointed out on the X platform that xAI's chart did not include the AIME 2025 score of o3-mini-high under "cons@64" conditions. Babushkin argued on the X platform that OpenAI had released similar misleading benchmark charts in the past. Although these charts are used to compare the performance of its own models.

Share to:

Author: PA一线

This content is for market information only and is not investment advice.

Follow PANews official accounts, navigate bull and bear markets together