sorry just to clarify - the real benchmark of interest is: (opens in new tab)

sorry just to clarify - the real benchmark of interest is: "what is the research org agent code that produces improvements on nanochat the fastest?" this is the new meta.