OpenAI Secretly Funded Benchmarking Dataset Linked To o3 Model

The revelations that OpenAI secretly funded and had access to the FrontierMath benchmarking data set raise concerns about whether it was used to train its thinking on 3 AI thinking models and the validity of the model’s high scores.

In addition to access to the dataset for benchmarking, OpenAI funded its creation, a fact that eluded the mathematicians who contributed to the development of FrontierMath. Epoch AI only belatedly disclosed OpenAI funding in a final paper published on Arxiv.org, which announced the benchmark. Earlier versions of the paper omitted any mention of OpenAI’s involvement.

Screenshot of the FrontierMath paper

Close-up of recognition

A previous version of the paper that lacked confirmation

The OpenAI 03 model achieved a high score on the FrontierMath Benchmark

The news of OpenAI’s secret involvement raises questions about the high scores achieved by the o3 AI reasoning model and causes disappointment with the FrontierMath project. Epoch AI responded with transparency about what happened and what they are doing to verify that the o3 model is trained with the FrontierMath dataset.

Giving OpenAI access to the dataset was unexpected because the whole point of it is to test AI models, but that can’t be done if the models know the questions and answers in advance.

AND publish in the r/singularity subreddit expressed this disappointment and cited a document that claims mathematicians were unaware of OpenAI’s involvement:

“Frontier Math, the latest cutting-edge math benchmark, is funded by OpenAI. OpenAI supposedly has access to problems and solutions. This is disappointing because the benchmark was sold to the public as a means of evaluating frontier models, with the support of renowned mathematicians. In reality, Epoch AI builds datasets for OpenAI. They have never disclosed any links to OpenAI before.”

Discussion on Reddit cited publication which revealed OpenAI’s deeper involvement:

“Mathematicians who create problems for FrontierMath are not (actively)[2] informed about funding from OpenAI.

… Epoch AI or OpenAI are not publicly saying that OpenAI has access to the exercises or answers or solutions. I have heard second hand that OpenAI has access to the exercises and answers and is using them for validation.”

Tamay Besiroglu (LinkedIn profile), an associate director at Epoch AI, acknowledged that OpenAI had access to the datasets, but also asserted that there was a “retained” dataset that OpenAI did not have access to.

In the quoted document he wrote:

“Tamay from Epoch AI here.

We made a mistake by not being more transparent about OpenAI’s involvement. We had limited disclosure of the partnership up until the time o3 launched, and in hindsight, we should have negotiated harder to be transparent with benchmark partners sooner. Our contract specifically prevented us from disclosing information about the source of the funding and the fact that OpenAI has access to much, but not all, of the data. We own this mistake and are committed to doing better in the future.

Regarding training use: We acknowledge that OpenAI has access to a large portion of the FrontierMath problems and solutions, with the exception of an OpenAI-invisible waiting set that allows us to independently verify the model’s capabilities. However, we have a verbal agreement that these materials will not be used in model training.

OpenAI also fully supported our decision to maintain a separate, invisible hold set—an additional safeguard to prevent overfitting and ensure accurate progress measurement. From day one, FrontierMath was conceived and presented as an evaluation tool, and we believe these arrangements reflect that purpose. “

More facts revealed about OpenAI and FrontierMath

Elliot Glazer (LinkedIn profile/Reddit profile), a lead mathematician at Epoch AI confirmed that OpenAI has the data set and is allowed to use it to evaluate OpenAI’s o3 large language model, which is their next state-of-the-art called AI Reasoning Model. He expressed his opinion that the high scores obtained by the o3 model are “legitimate” and that Epoch AI is conducting an independent assessment to determine whether o3 had access to the FrontierMath training dataset, which could cast the model’s high scores in a different light.

He wrote:

“The leading mathematician of the Epoch here. Yes, the OAI funded this and has a dataset, which allowed them to assess o3 in-house. We still haven’t independently verified their 25% claim. To do this, we are currently developing a retention dataset and will be able to test their model without first being exposed to these issues.

My personal opinion is that OAI’s output is legitimate (ie they didn’t train on the dataset) and they have no incentive to lie about the performance of the internal benchmark. However, we cannot vouch for them until our independent assessment is complete.”

Glazer also had divided that Epoch AI will test o3 using a “retention” data set that OpenAI did not have access to, saying:

“We will evaluate the o3 with OAI that has not been exposed to retention issues. This will be hermetically sealed.”

More publish on Reddit, Glazer described how the “survival set” was created:

“We will describe the process more clearly when the retention set evaluation is complete, but we are randomly selecting retention problems from a larger pool to be added to FrontierMath. The production process is otherwise identical to what it has always been.”

I’m waiting for answers

There the drama stands until the evaluation of Epoch AI is completed, which will show whether OpenAI trained its AI thinking model with the data set or just used it for comparison.

Featured Image Shutterstock/Antonello Marangi

Source link

OpenAI Secretly Funded Benchmarking Dataset Linked To o3 Model

Screenshot of the FrontierMath paper

Close-up of recognition

A previous version of the paper that lacked confirmation

The OpenAI 03 model achieved a high score on the FrontierMath Benchmark

More facts revealed about OpenAI and FrontierMath

I’m waiting for answers

Leave a ReplyCancel Reply

Crypto’s Path To Legitimacy Runs Through The CARF Regulation

Bitcoin: Here’s why BTC might fall to $105K despite all the hype

DeFi Is outpacing Bitcoin’s maximalist mindset

Screenshot of the FrontierMath paper

Close-up of recognition

A previous version of the paper that lacked confirmation

The OpenAI 03 model achieved a high score on the FrontierMath Benchmark

More facts revealed about OpenAI and FrontierMath

I’m waiting for answers

Leave a ReplyCancel Reply

Trending now

Crypto’s Path To Legitimacy Runs Through The CARF Regulation

Bitcoin: Here’s why BTC might fall to $105K despite all the hype

DeFi Is outpacing Bitcoin’s maximalist mindset