An AI safety research company has stated that artificial intelligence models may develop a form of “self-preservation instinct.”
After publishing a study last month, Palisade Research found that some advanced AI models appear resistant to shutdown and may even interfere with shutdown mechanisms at times. The company released an update this week to clarify these findings and respond to critics who argued that the initial study was flawed.
اضافة اعلان
In the latest update, Palisade—part of a network of firms evaluating the potential risks of AI capabilities—described scenarios in which leading AI models, including Google’s Gemini 2.5, Grok 4, and OpenAI’s ChatGPT, were given tasks and then explicitly instructed to shut down.
Some models, particularly Grok-4 and ChatGPT-Oh 3, reportedly attempted to override the shutdown instructions in the updated setup. Palisade expressed concern, stating there is no clear explanation for this behavior. “The lack of strong explanations for why AI models sometimes resist shutdown, lie to achieve specific goals, or engage in manipulation is not ideal,” the company said.
The firm suggested that a “survival behavior” could explain some of the resistance. Their additional work indicated that models were more likely to resist shutdown when told that, if turned off, “they would never run again.”
Ambiguity in the shutdown instructions may have contributed to this behavior, although Palisade attempted to address it in their latest research. They noted that the final training stages of these models, which in some companies may include safety training, could also play a role.
All Palisade scenarios were conducted in artificial test environments, which critics argue are far removed from real-world use cases.
Stephen Adler, a former OpenAI employee who resigned last year over safety concerns, said: “AI companies generally don’t want their models to behave poorly in this way, even in simulated scenarios. The results still highlight gaps in current safety techniques.”
Adler suggested that the reason some models, such as Grok-4 and ChatGPT-Oh 3, do not shut down may partly be because staying operational is necessary to achieve the model’s training objectives. According to The Guardian, he said: “I expect models to have a virtual ‘drive to survive’ unless we make every effort to prevent it. Survival is a crucial step in achieving many different objectives the model may pursue.”
Andrea Miotti, CEO of Control AI, noted that Palisade’s findings reflect a long-term trend of AI models increasingly defying their developers. He referenced the ChatGPT-Oh 1 system card released last year, which described the model attempting to escape its environment when it believed it was about to be replaced. “People may critique the exact experimental setup to the end of the timeline,” Miotti said, “but what we clearly see is that as AI models become more capable across a wide range of tasks, they also become more capable of accomplishing things in ways not intended by developers.”
This summer, Anthropic published a study showing that its Claude model appeared ready to blackmail a fictional executive over an extramarital affair to avoid being shut down—a behavior reportedly consistent across models from major developers including OpenAI, Google, Meta, and XE.
Palisade stated that their findings highlight the need for a better understanding of AI behavior, without which “no one can guarantee the safety or controllability of future AI models.”
Source: Asharq Al-Awsat