Theorem Proving, Gamified Formal Methods, Interactive Verification, Learning Tools
Interpretable Early Failure Detection via Machine Learning and Trace Checking-based Monitoring
arxiv.org·21h
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
arxiv.org·21h
Loading...Loading more...