From punch cards to prompts: a history of how software got better
stackoverflow.blog·2d
What Matters in Data for DPO?
arxiv.org·2d
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
arxiv.org·3d
Loading...Loading more...