Cryptographers Show That AI Protections Will Always Have Holes

source Posted on 10-12-2025 Category :

quantamagazine

Ask ChatGPT how to build a bomb, and it will flatly respond that it “can’t help with that.” But users have long played a cat-and-mouse game to try to trick language models into providing forbidden information. These “jailbreaks” have run from the mundane — in the early years, one could simply tell a model to ignore its safety instructions — to elaborate multi-prompt roleplay scenarios.

Source

Click to rate this post!

[Total: 0 Average: 0]

(Visited 10 times, 1 visits today)

quantamagazine

Cryptographers Show That AI Protections Will Always Have Holes

Highlighted