Patterns for Building Cybersecurity Evals (opens in new tab)
A sandboxed target, inputs that influence task difficulty, tools, and a grader.
Read the original articleA sandboxed target, inputs that influence task difficulty, tools, and a grader.
Read the original article