Meet the AI jailbreakers: ‘I see the worst things humanity has produced’ (opens in new tab)

Covered by The VergeDiscussed on Hacker News

To test the safety and security of AI, hackers have to trick large language models into breaking their own rules. It requires ingenuity and manipulation – and can come at a deep emotional costA few months ago, Valen Tagliabue sat in his hotel room watching his chatbot, and felt euphoric. He had just manipulated it so skilfully, so subtly, that it began ignoring its own safety rules. It told him how to sequence new, potentially lethal pathogens and how to make them resistant to known drugs.Tag...

Read the original article

Sign in to keep reading the full article.

Sign Up Log In

Covered in 1 article

The Verge

·

Covered in 1 article

Hackers are learning to exploit chatbot ‘personalities’