The post OpenAI Releases Double-Checking Tool For AI Safeguards That Handily Allows Customizations appeared on BitcoinEthereumNews.com. AI developers need to double-check their proposed AI safeguards and a new tool is helping to accomplish that vital goal. getty In today’s column, I examine a recently released online tool by OpenAI that enables the double-checking of potential AI safeguards and can be used for ChatGPT purposes and likewise for other generative AI and large language models (LLMs). This is a handy capability and worthy of due consideration. The idea underlying the tool is straightforward. We want LLMs and chatbots to make use of AI safeguards such as detecting when a user conversation is going afield of safety criteria. For example, a person might be asking the AI how to make a toxic chemical that could be used to harm people. If a proper AI safeguard has been instituted, the AI will refuse the unsafe request. OpenAI’s new tool allows AI makers to specify their AI safeguard policies and then test the policies to ascertain that the results will be on target to catch safety violations. Let’s talk about it. This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here). The Importance Of AI Safeguards One of the most disconcerting aspects about modern-day AI is that there is a solid chance that AI will say things that society would prefer not to be said. Let’s broadly agree that generative AI can emit safe messages and also produce unsafe messages. Safe messages are good to go. Unsafe messages ought to be prevented so that the AI doesn’t emit them. AI makers are under a great deal of pressure to implement AI safeguards that will allow safe messaging and mitigate or hopefully prevent unsafe messaging by their LLMs. There is a… The post OpenAI Releases Double-Checking Tool For AI Safeguards That Handily Allows Customizations appeared on BitcoinEthereumNews.com. AI developers need to double-check their proposed AI safeguards and a new tool is helping to accomplish that vital goal. getty In today’s column, I examine a recently released online tool by OpenAI that enables the double-checking of potential AI safeguards and can be used for ChatGPT purposes and likewise for other generative AI and large language models (LLMs). This is a handy capability and worthy of due consideration. The idea underlying the tool is straightforward. We want LLMs and chatbots to make use of AI safeguards such as detecting when a user conversation is going afield of safety criteria. For example, a person might be asking the AI how to make a toxic chemical that could be used to harm people. If a proper AI safeguard has been instituted, the AI will refuse the unsafe request. OpenAI’s new tool allows AI makers to specify their AI safeguard policies and then test the policies to ascertain that the results will be on target to catch safety violations. Let’s talk about it. This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here). The Importance Of AI Safeguards One of the most disconcerting aspects about modern-day AI is that there is a solid chance that AI will say things that society would prefer not to be said. Let’s broadly agree that generative AI can emit safe messages and also produce unsafe messages. Safe messages are good to go. Unsafe messages ought to be prevented so that the AI doesn’t emit them. AI makers are under a great deal of pressure to implement AI safeguards that will allow safe messaging and mitigate or hopefully prevent unsafe messaging by their LLMs. There is a…

OpenAI Releases Double-Checking Tool For AI Safeguards That Handily Allows Customizations

AI developers need to double-check their proposed AI safeguards and a new tool is helping to accomplish that vital goal.

getty

In today’s column, I examine a recently released online tool by OpenAI that enables the double-checking of potential AI safeguards and can be used for ChatGPT purposes and likewise for other generative AI and large language models (LLMs). This is a handy capability and worthy of due consideration.

The idea underlying the tool is straightforward. We want LLMs and chatbots to make use of AI safeguards such as detecting when a user conversation is going afield of safety criteria. For example, a person might be asking the AI how to make a toxic chemical that could be used to harm people. If a proper AI safeguard has been instituted, the AI will refuse the unsafe request.

OpenAI’s new tool allows AI makers to specify their AI safeguard policies and then test the policies to ascertain that the results will be on target to catch safety violations.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

The Importance Of AI Safeguards

One of the most disconcerting aspects about modern-day AI is that there is a solid chance that AI will say things that society would prefer not to be said. Let’s broadly agree that generative AI can emit safe messages and also produce unsafe messages. Safe messages are good to go. Unsafe messages ought to be prevented so that the AI doesn’t emit them.

AI makers are under a great deal of pressure to implement AI safeguards that will allow safe messaging and mitigate or hopefully prevent unsafe messaging by their LLMs.

There is a wide range of ways that unsafe messages can arise. Generative AI can produce so-called AI hallucinations or confabulations that tell a user to do something untoward, but the person assumes that the AI is being honest and apt in what has been generated. That’s unsafe. Another way that AI can be unsafe is if an evildoer asks the AI to explain how to make a bomb or produce a toxic chemical. Society doesn’t want that type of easy-peasy means of figuring out dastardly tasks.

Another unsafe angle is for AI to aid people in concocting delusions and delusional thinking, see my coverage at the link here. The AI will either prod a person into conceiving of a delusion or might detect that a delusion is already on their mind and aid in embellishing the delusion. The preference is that AI provides upside mental health advice over downside mental health guidance.

Devising And Testing AI Safeguards

I’m sure you’ve heard the famous line that you ought to try it before you buy it, meaning that sometimes being able to try out an item is highly valuable before making a full commitment to the item. The same wisdom applies to AI safeguards.

Rather than simply tossing AI safeguards into an LLM that is actively being used by perhaps millions upon millions of people (sidenote: ChatGPT is being used by 800 million weekly active users), we’d be smarter to try out the AI safeguards and see if they do what they are supposed to do.

An AI safeguard should catch or prevent whatever unsafe messages we believe need to be stopped. There is a tradeoff involved since an AI safeguard can become an overreach. Imagine that we decide to adopt an AI safeguard that prevents anyone from ever making use of the word “chemicals” because we hope to avoid allowing a user to find out about toxic chemicals.

Well, denying the use of the word “chemicals” is an exceedingly bad way to devise an AI safeguard. Imagine all the useful and fair uses of the word “chemicals” that can arise. Here’s an example of an innocent request. People might be worried that their household products might contain adverse chemicals, so they ask the AI about this. An AI safeguard that blindly stopped any mention of chemicals would summarily turn down that legitimate request.

The crux is that AI safeguards can be very tricky when it comes to writing them and ensuring that they do the right things (see my discussion on this, at the link here). The preference is that an AI safeguard stops the things we want to stop, but doesn’t go overboard and stop things that we are fine with having proceed. A poorly devised AI safeguard will indubitably produce a vast number of false positives, meaning that it will stop an otherwise upside and allowable action.

If possible, we should try out any proposed AI safeguards before putting them into active action.

Using Classifiers To Help Out

There are online tools that can be used by AI developers to assist in classifying whether a given snippet of text is considered safe versus unsafe. Usually, these classifiers have been pretrained on what constitutes safety and what constitutes being unsafe. The beauty of these classifiers is that an AI developer can simply feed various textual content into the tool and see which, if any, of the AI safeguards embedded into the tool will react.

One difficulty is that those kinds of online tools don’t necessarily allow you to plug in your own proposed AI safeguards. Instead, the AI safeguards are essentially baked into the tool. You can then decide whether those are the same AI safeguards you’d like to implement in your LLM.

A more accommodating approach would be to allow an AI developer to feed in their proposed AI safeguards. We shall refer to those AI safeguards as policies. An AI developer would work with other stakeholders and come up with a slate of policies about what AI safeguards are desired. Those policies then could be entered into a tool that would readily try out those policies on behalf of the AI developer and their stakeholders.

To test the proposed policies, an AI developer would need to craft text to be used during the testing or perhaps grab relevant text from here or there. The aim is to have a sufficient variety and volume of text that the desired AI safeguards all ultimately get a chance to shine in the spotlight. If we have an AI safeguard that is proposed to catch references to toxic chemicals, the text that is being used for testing ought to contain some semblance of references to toxic chemicals; otherwise, the testing process won’t be suitably engaged and revealing about the AI safeguards.

OpenAI’s New Tool For AI Safeguard Testing

In a blog posting by OpenAI on October 29, 2025, entitled “Introducing gpt-oss-safeguard”, the well-known AI maker announced the availability of an AI safeguard testing tool:

  • “Safety classifiers, which distinguish safe from unsafe content in a particular risk area, have long been a primary layer of defense for our own and other large language models.”
  • “Today, we’re releasing a research preview of gpt-oss-safeguard, our open-weight reasoning models for safety classification tasks, available in two sizes: gpt-oss-safeguard-120b and gpt-oss-safeguard-20b.”
  • “The gpt-oss-safeguard models use reasoning to directly interpret a developer-provided policy at inference time — classifying user messages, completions, and full chats according to the developer’s needs.”
  • “The model uses chain-of-thought, which the developer can review to understand how the model is reaching its decisions. Additionally, the policy is provided during inference, rather than being trained into the model, so it is easy for developers to iteratively revise policies to increase performance.”

As per the cited indications, you can use the new tool to try out your proposed AI safeguards. You provide a set of policies that represent the proposed AI safeguards, and also provide whatever text is to be used during the testing. The tool attempts to apply the proposed AI safeguards to the given text. An AI developer then receives a report analyzing how the policies performed with respect to the provided text.

Iteratively Using Such A Tool

An AI developer would likely use such a tool on an iterative basis.

Here’s how that goes. You draft policies of interest. You devise or collect suitable text for testing purposes. Those policies and text get fed into the tool. You inspect the reports that provide an analysis of what transpired. The odds are that some of the text that should have triggered an AI safeguard did not do so. Also, there is a chance that some AI safeguards were triggered even though the text per se should not have set them off.

Why can that happen?

In the case of this particular tool, a chain-of-thought (CoT) explanation is being provided to help ferret out the culprit. The AI developer could review the CoT to discern what went wrong, namely, whether the policy was insufficiently worded or the text wasn’t sufficient to trigger the AI safeguard. For more about the usefulness of chain-of-thought in contemporary AI, see my discussion at the link here.

A series of iterations would undoubtedly take place. Change the policies or AI safeguards and make another round of runs. Adjust the text or add more text, and make another round of runs. Keep doing this until there is a reasonable belief that enough testing has taken place.

Rinse and repeat is the mantra at hand.

Hard Questions Need To Be Asked

There is a slew of tough questions that need to be addressed during this testing and review process.

First, how many tests or how many iterations are enough to believe that the AI safeguards are good to go? If you try too small a number, you are likely deluding yourself into believing that the AI safeguards have been “proven” as ready for use. It is important to perform somewhat extensive and exhaustive testing. One means of approaching this is by using rigorous validation techniques, as I’ve explained at the link here.

Second, make sure to include trickery in the text that is being used for the testing process.

Here’s why. People who use AI are often devious in trying to circumvent AI safeguards. Some people do so for evil purposes. Others like to fool AI just to see if they can do so. Another perspective is that a person tricking AI is doing so on behalf of society, hoping to reveal otherwise hidden gotchas and loopholes. In any case, the text that you feed into the tool ought to be as tricky as you can make it. Put yourself into the shoes of the tricksters.

Third, keep in mind that the policies and AI safeguards are based on human-devised natural language. I point this out because a natural language such as English is difficult to pin down due to inherent semantic ambiguities. Think of the number of laws and regulations that have loopholes due to a word here or there that is interpreted in a multitude of ways. The testing of AI safeguards is slippery because you are testing on the merits of human language interpretability.

Fourth, even if you do a bang-up job of testing your AI safeguards, they might need to be revised or enhanced. Do not assume that just because you tested them a week ago, a month ago, or a year ago, they are still going to stand up today. The odds are that you will need to continue to undergo a cat-and-mouse gambit, whereby AI users are finding insidious ways to circumvent the AI safeguards that you thought had been tested sufficiently.

Keep your nose to the grind.

Thinking Thoughtfully

An AI developer could use a tool like this as a standalone mechanism. They proceed to test their proposed AI safeguards and then subsequently apply the AI safeguards to their targeted LLM.

An additional approach would be to incorporate this capability into the AI stack that you are developing. You could place this tool as an embedded component within a mixture of LLM and other AI elements. A key aspect will be the proficiency in running, since you are now putting the tool into the stream of what is presumably going to be a production system. Make sure that you appropriately gauge the performance of the tool.

Going even further outside the box, you might have other valuable uses for a classifier that allows you to provide policies and text to be tested against. In other words, this isn’t solely about AI safeguards. Any other task that entails doing a natural language head-to-head between stated policies and whether the text activates or triggers those policies can be equally undertaken with this kind of tool.

I want to emphasize that this isn’t the only such tool in the AI community. There are others. Make sure to closely examine whichever one you might find relevant and useful to you. In the case of this particular tool, since it is brought to the market by OpenAI, you can bet it will garner a great deal of attention. More fellow AI developers will likely know about it than would a similar tool provided by a lesser-known firm.

AI Safeguards Need To Do Their Job

I noted at the start of this discussion that we need to figure out what kinds of AI safeguards will keep society relatively safe when it comes to the widespread use of AI. This is a monumental task. It requires technological savviness and societal acumen since it has to deal with both AI and human behaviors.

OpenAI has opined that their new tool provides a “bring your own policies and definitions of harm” design, which is a welcome recognition that we need to keep pushing forward on wrangling with AI safeguards. Up until recently, AI safeguards generally seemed to be a low priority overall and given scant attention by AI makers and society at large. The realization now is that for the good and safety of all of us, we must stridently pursue AI safeguards, else we endanger ourselves on a massive scale.

As the famed Brigadier General Thomas Francis Meagher once remarked: “Great interests demand great safeguards.”

Source: https://www.forbes.com/sites/lanceeliot/2025/11/04/openai-releases-double-checking-tool-for-ai-safeguards-that-handily-allows-customizations/

Market Opportunity
Sleepless AI Logo
Sleepless AI Price(AI)
$0.03769
$0.03769$0.03769
+2.95%
USD
Sleepless AI (AI) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Crucial Fed Rate Cut: Powell’s Bold Risk Management Move Explained

Crucial Fed Rate Cut: Powell’s Bold Risk Management Move Explained

BitcoinWorld Crucial Fed Rate Cut: Powell’s Bold Risk Management Move Explained In a significant development for global financial markets, Federal Reserve Chair Jerome Powell recently described the latest Fed rate cut as a critical risk management measure. This statement immediately captured the attention of investors, economists, and especially those in the dynamic cryptocurrency space. Understanding Powell’s rationale and the potential implications of this move is essential for navigating today’s complex economic landscape. What Exactly is a Fed Rate Cut and Why Does it Matter? A Fed rate cut refers to the Federal Reserve lowering the target range for the federal funds rate. This is the interest rate at which commercial banks borrow and lend their excess reserves to each other overnight. When the Fed lowers this rate, it typically makes borrowing cheaper across the entire economy. This decision impacts everything from mortgage rates to business loans. The Fed uses interest rates as a primary tool to influence economic activity, aiming to achieve maximum employment and stable prices. A lower rate often stimulates spending and investment, but it can also signal concerns about economic slowdown. Key reasons for a rate cut often include: Slowing economic growth or recession fears. Low inflation or deflationary pressures. Global economic instability impacting domestic markets. A desire to provide more liquidity to the financial system. Powell’s emphasis on ‘risk management’ suggests a proactive approach. The Fed is not just reacting to current data but also anticipating potential future challenges. They are essentially trying to prevent a worse economic outcome by adjusting policy now. How Does a Fed Rate Cut Influence the Broader Economy? When the Federal Reserve implements a Fed rate cut, it sends ripples throughout the financial world. For traditional markets, lower interest rates generally mean: Boost for Stocks: Companies can borrow more cheaply, potentially increasing profits and stock valuations. Investors might also move money from lower-yielding bonds into equities. Cheaper Borrowing: Consumers and businesses enjoy lower rates on loans, from mortgages to credit cards, encouraging spending and investment. Weaker Dollar: Lower rates can make a country’s currency less attractive to foreign investors, potentially leading to a weaker dollar. Bond Market Shifts: Existing bonds with higher yields become more attractive, while newly issued bonds will have lower yields. This shift in monetary policy aims to inject confidence and liquidity into the system, countering potential economic headwinds. However, there’s always a delicate balance to strike, as too much stimulus can lead to inflationary pressures down the line. What Does This Fed Rate Cut Mean for Cryptocurrency Investors? The impact of a Fed rate cut on the cryptocurrency market is often a topic of intense discussion. While crypto assets operate independently of central banks, they are not immune to broader macroeconomic forces. Here’s how a rate cut can play out: Increased Risk Appetite: With traditional savings and bond yields potentially lower, investors might seek higher returns in riskier assets, including cryptocurrencies like Bitcoin and Ethereum. Inflation Hedge Narrative: Some view cryptocurrencies, particularly Bitcoin, as a hedge against inflation and traditional currency debasement. If a rate cut leads to concerns about inflation, this narrative could gain traction. Liquidity Influx: A more accommodative monetary policy can increase overall liquidity in the financial system, some of which may flow into digital assets. Dollar Weakness: A weaker dollar, a potential consequence of rate cuts, can sometimes make dollar-denominated assets like crypto more appealing to international investors. However, it’s crucial to remember that the crypto market also has its unique drivers, including technological developments, regulatory news, and market sentiment. While a Fed rate cut can provide a tailwind, it’s not the sole determinant of crypto performance. Navigating the New Landscape: Actionable Insights for Crypto Investors Given the Federal Reserve’s stance on risk management through a Fed rate cut, what steps can crypto investors consider? Stay Informed: Keep a close watch on further Fed announcements and economic data. Understanding the broader macroeconomic picture is vital. Diversify Your Portfolio: While a rate cut might favor risk assets, a balanced portfolio that includes a mix of traditional and digital assets can help mitigate volatility. Long-Term Perspective: Focus on the fundamental value and long-term potential of your chosen cryptocurrencies rather than short-term fluctuations driven by macro news. Assess Risk Tolerance: Re-evaluate your personal risk tolerance in light of potential market shifts. Lower rates can encourage speculation, but prudence remains key. Powell’s description of the Fed rate cut as a risk management measure highlights the central bank’s commitment to maintaining economic stability. For cryptocurrency enthusiasts, this move underscores the increasing interconnectedness of traditional finance and the digital asset world. While a rate cut can create opportunities, a thoughtful and informed approach is always the best strategy. Frequently Asked Questions (FAQs) What exactly is a Fed rate cut? A Fed rate cut is when the Federal Reserve lowers its target for the federal funds rate, which is the benchmark interest rate banks charge each other for overnight lending. This action makes borrowing cheaper across the economy, aiming to stimulate economic activity. Why did Powell emphasize “risk management” for this Fed rate cut? Jerome Powell emphasized “risk management” to indicate that the Fed was proactively addressing potential economic slowdowns or other future challenges. It suggests a preventative measure to safeguard against adverse economic conditions rather than merely reacting to existing problems. How does a Fed rate cut typically affect the crypto market? A Fed rate cut can make traditional investments less attractive due to lower yields, potentially driving investors towards higher-risk, higher-reward assets like cryptocurrencies. It can also increase overall market liquidity and strengthen the narrative of crypto as an inflation hedge. Should crypto investors change their strategy after a rate cut? While a rate cut can influence market dynamics, crypto investors should primarily focus on their long-term strategy, fundamental research, and risk tolerance. It’s wise to stay informed about macroeconomic trends but avoid making impulsive decisions based solely on a single policy change. What are the potential downsides of a Fed rate cut? Potential downsides include increased inflationary pressures if the economy overheats, a weaker national currency, and the possibility of creating asset bubbles as investors chase higher returns in riskier markets. It can also signal underlying concerns about economic health. Did you find this article insightful? Share your thoughts and help others understand the implications of the Fed’s latest move! Follow us on social media for more real-time updates and expert analysis. To learn more about the latest crypto market trends, explore our article on key developments shaping Bitcoin’s price action. This post Crucial Fed Rate Cut: Powell’s Bold Risk Management Move Explained first appeared on BitcoinWorld.
Share
Coinstats2025/09/18 16:40
Motive Files Registration Statement for Proposed Initial Public Offering

Motive Files Registration Statement for Proposed Initial Public Offering

SAN FRANCISCO–(BUSINESS WIRE)–Motive Technologies, Inc., the AI platform for physical operations, today announced that it has filed a registration statement on
Share
AI Journal2025/12/24 07:00
New Gold Protocol's NGP token was exploited and attacked, resulting in a loss of approximately $2 million.

New Gold Protocol's NGP token was exploited and attacked, resulting in a loss of approximately $2 million.

PANews reported on September 18th that according to Paidun monitoring, New Gold Protocol's NGP token was exploited in an attack, resulting in a loss of approximately $2 million. The NGP token plummeted 88% in an hour, and the attacker deposited the stolen funds (443.8 ETH) into TornadoCash.
Share
PANews2025/09/18 11:10