Interactive Proving, Tactic Composition, Proof Automation, Mathlib
Human reward hacking
danmackinlay.name·1d
Effective Code Reviews with Conventional Comments • Paul Slaughter & Adrienne Braganza
youtube.com·1d
PromptCOS: Towards System Prompt Copyright Auditing for LLMs via Content-level Output Similarity
arxiv.org·3d
Know When to Explore: Difficulty-Aware Certainty as a Guide for LLM Reinforcement Learning
arxiv.org·4d
Loading...Loading more...