Cryptographers Show That AI Protections Will Always Have Holes
Ask ChatGPT how to build a bomb, and it will flatly respond that it “can’t help with that.” But users have long played a cat-and-mouse game to try to trick language models into providing forbidden information. These “jailbreaks” have run from the mundane — in the early years, one could simply tell a model to ignore its safety instructions — to elaborate multi-prompt roleplay scenarios.
Click to rate this post!
[Total: 0 Average: 0]
You have already voted for this article
(Visited 2 times, 1 visits today)
