LLM agent safety, multi-turn red-teaming, jailbreak benchmarks, adversarial robustness, safety-critical systems (opens in new tab)

Covered by DEV Community

Large language model (LLM) agents are increasingly proposed as supervisory components for safety-critical systems, yet their robustness under sustained, adaptive adversarial pressure remains poorly characterized. We present NRT-Bench, a benchmark for multi-turn red-teaming of LLM agents acting as operators of a safety-critical system, instantiated in a simulated nuclear power plant control room. A five-role operator team, each backed by a conf...

Read the original article

Sign in to keep reading the full article.

Sign Up Log In

Covered in 1 article

DEV Community·

Securing LLM Agent Teams: Inside NRT-Defense v0.4.0

Discussed on DEV