Measuring AI Ability to Complete Long Software Tasks (opens in new tab)
This paper from METR (Model Evaluation & Threat Research) introduces a new metric for tracking AI progress: the "50%-task-completion time h...
Read the original articleThis paper from METR (Model Evaluation & Threat Research) introduces a new metric for tracking AI progress: the "50%-task-completion time h...
Read the original article