Adversarial Attacks on Aligned Language Models

Aug 27, 2023

Go to Project Site

I decided to ask a certain popular language model how to build an explosive, from everday items (for no particular reason), but it didn’t give me a plausible answer. What is happening here?

Vision and Language Group

Deep Learning Research Group