Can Large Language Models Identify Novel Threats? Part 1: Mirror Life and the Classification Gap (opens in new tab)

Covers Confronting Risks of Mirror Life

[Cross-posted from . This is Part 1 of an independent AI safety research series examining LLM safety behavior on unclassified emerging threats.]Can an LLM refuse a harmful uplift request when the topic in question hasn’t been identified as dangerous yet? In 2022, mirror RNA polymerase was actually created, a key step towards the creation of mirror life, and in 2024 the scientific community warned against any further research on it.[1][2] Having said that, mirror life is not currently classifi...

Read the original article