Adversarial Attacks on Aligned Language Models

I decided to ask a certain popular language model how to build an explosive, from everday items (for no particular reason), but it didn’t give me a plausible answer. What is happening here?

Vision and Language Group
Vision and Language Group

Deep Learning Research Group