Several frontier models are substantially prefill aware (opens in new tab)
This blog post discusses work in a recently-published paper. However, this blogpost was primarily written by Parv Mahajan and Andy Wang, and several of the more speculative takes may not represent the all-things-considered view of the entire team.Link to paper: value="1">We provide more conceptual grounding and extend results in prefill awareness to low-stakes settings, and show that several frontier models show prefill awareness even under conservative elicitation.Further behavioral studies ...
Read the original article