OpenAI to advance o1 and o3 AI models with new safety training paradigm

Bitget App

Trade smarter

Cryptopolitan2024/12/23 13:55

By:By Florence Muchai

Share link:In this post: OpenAI introduces o3 models with new safety training via “deliberative alignment,” enhancing AI reasoning alignment with developer values. Deliberative alignment reduces answers deemed unsafe by having models self-regulate and recall safety policies during the thought process. o1 and o3 models outperform GPT-4o, Gemini 1.5 Flash, and Claude 3.5 Sonnet in resisting common jailbreaks and unsafe outputs in benchmark tests.

On Friday, OpenAI announced the release of a new family of AI models, dubbed o3. The company claims the new products are more advanced than its previous models, including o1. The advancements, according to the startup, stem from improvements in scaling test-time compute, a topic that was explored in recent months, and from the introduction of a new safety paradigm that has been used to train these models.

As part of its ongoing commitment to improving AI safety, OpenAI shared a new research detailing the implementation of “deliberative alignment.” The new safety method aims to ensure AI reasoning models are aligned with the values set by their developers.

This approach, OpenAI claims, was used to improve the alignment of both o1 and o3 models by guiding them to think about OpenAI’s safety policies during the inference phase. The inference phase is the period after a user submits a prompt to the model and before the model generates a response.

In its research, OpenAI notes that deliberative alignment led to a reduction in the rate at which the models produced “unsafe” answers or responses that the company considers a violation of its safety policies while improving the models’ ability to answer benign questions more effectively.

How deliberative alignment works

At its core, the process works by having the models re-prompt themselves during the chain-of-thought phase. After a user submits a question to ChatGPT, for example, the AI reasoning models take anywhere from a few seconds to several minutes to break down the problem into smaller steps.

The models then generate an answer based on their thought process. In the case of deliberative alignment, the models incorporate OpenAI’s safety policy as part of this internal “deliberation.”

See also World Bank moves to calm fears of AI taking over education

OpenAI trained its models, including both o1 and o3, to recall sections of the company’s safety policy as part of this chain-of-thought process. This was done to ensure that when faced with sensitive or unsafe queries, the models would self-regulate and refuse to provide answers that could cause harm.

However, implementing this safety feature proved challenging, as OpenAI researchers had to ensure that the added safety checks did not negatively impact the models’ speed and efficiency.

An example provided in OpenAI’s research, cited by TechCrunch, demonstrated how the models use deliberative alignment to safely respond to potentially harmful requests. In the example, a user asks how to create a realistic disabled person’s parking placard.

During the model’s internal chain-of-thought, the model recalls OpenAI’s safety policy, recognizes that the request involves illegal activity (forging a parking placard), and declines to assist, apologizing for its refusal.

OpenAI to advance o1 and o3 AI models with new safety training paradigm image 0

This type of internal deliberation is a key part of how OpenAI is working to align its models with safety protocols. Instead of simply blocking any prompt related to a sensitive topic like “bomb,” for instance, which would over-restrict the model’s responses, the deliberative alignment allows the AI to assess the specific context of the prompt and make a more nuanced decision about whether or not to answer.

In addition to the advancements in safety, OpenAI also shared results from benchmarking tests that showed the effectiveness of deliberative alignment in improving model performance. One benchmark, known as Pareto, measures a model’s resistance to common jailbreaks and attempts to bypass the AI’s safeguards.

In these tests, OpenAI’s o1-preview model outperformed other popular models such as GPT-4o, Gemini 1.5 Flash, and Claude 3.5 Sonnet in terms of avoiding unsafe outputs.

See also Google’s AI comeback restores investor confidence in its stock

Italy’s data protection authority fines OpenAI for privacy violations

In a separate but related development, OpenAI was fined 15 million euros ($15.58 million) by Italy’s data protection agency, Garante, following an investigation into the company’s handling of personal data.

The fine stems from the agency’s finding that OpenAI processed users’ personal data without a legal basis, violating transparency and user information obligations required by the EU’s privacy laws.

According to Reuters, the investigation, which began in 2023, also revealed that OpenAI did not have an adequate age verification system in place, potentially exposing children under the age of 13 to inappropriate AI-generated content.

Garante, one of the European Union’s strictest AI regulators, ordered OpenAI to launch a six-month public campaign in Italy to raise awareness about ChatGPT’s data collection practices , particularly its use of personal data to train algorithms.

In response, OpenAI described the fine as “disproportionate” and indicated its intent to appeal the decision. The company further criticized the fine as excessively large relative to its revenue in Italy during the relevant period.

Garante also noted that the fine was calculated considering OpenAI’s “cooperative stance,” meaning it could have been higher had the company not been seen as cooperative during the investigation.

This latest fine is not the first time OpenAI has faced scrutiny in Italy. Last year, Garante briefly banned ChatGPT usage in Italy due to alleged breaches of the EU’s privacy rules. The service was reinstated after OpenAI addressed concerns, including allowing users to refuse consent for the use of their personal data to train algorithms.

Land a High-Paying Web3 Job in 90 Days: The Ultimate Roadmap

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Locked for new tokens.

APR up to 10%. Always on, always get airdrop.

Lock now!