Open safety reasoning models with custom safety policies
gpt-oss-safeguard is a new family of open-source safety models (120b & 20b) from OpenAI. They use reasoning to classify content based on a custom, developer-provided policy at inference time, providing an explainable chain-of-thought for each decision.
It basically decouples the safety policy from the model's execution, which gives developers a much more "white-box" environment. You can even review the model's chain-of-thought to see why it made a decision.
The main benefit is that you can rapidly calibrate safety rules just by editing the policy text, without having to train a whole new classifier (which costs a lot of time and money).
Big deal for application safety, especially with all the legal risks around AI emerging.
You can try it out with the examples OpenAI provides, or just feed it your own policy to see how it works.
About gpt-oss-safeguard on Product Hunt
“Open safety reasoning models with custom safety policies”
gpt-oss-safeguard launched on Product Hunt on October 30th, 2025 and earned 119 upvotes and 1 comments, placing #14 on the daily leaderboard. gpt-oss-safeguard is a new family of open-source safety models (120b & 20b) from OpenAI. They use reasoning to classify content based on a custom, developer-provided policy at inference time, providing an explainable chain-of-thought for each decision.
On the analytics side, gpt-oss-safeguard competes within Open Source, Artificial Intelligence and Development — topics that collectively have 543.2k followers on Product Hunt. The dashboard above tracks how gpt-oss-safeguard performed against the three products that launched closest to it on the same day.
Who hunted gpt-oss-safeguard?
gpt-oss-safeguard was hunted by Zac Zuo. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Hi everyone!
OpenAI is offering a new direction for safety alignment on their open-weight models.
It basically decouples the safety policy from the model's execution, which gives developers a much more "white-box" environment. You can even review the model's chain-of-thought to see why it made a decision.
The main benefit is that you can rapidly calibrate safety rules just by editing the policy text, without having to train a whole new classifier (which costs a lot of time and money).
Big deal for application safety, especially with all the legal risks around AI emerging.
You can try it out with the examples OpenAI provides, or just feed it your own policy to see how it works.