FrontierCyber: Bringing Offensive Cyber Evaluations to Real Systems (opens in new tab)
FrontierCyber is a new benchmark that turns real-world systems into offensive cyber capability evaluations for AI models. Models are given verifiable objectives on real systems, from software and databases to mobile phones and routers, with no planted vulnerabilities or predefined exploit paths. Across a large, diverse suite of advanced challenges, FrontierCyber produces a repeatable and comparable capability signal.
Read the original article