OpenSIR: Open-Ended Self-Improving Reasoner
arxiv.org·7h
Flag this post

Title:OpenSIR: Open-Ended Self-Improving Reasoner

View PDF HTML (experimental)

Abstract:Recent advances in large language model (LLM) reasoning through reinforcement learning rely on annotated datasets for verifiable rewards, which may limit models’ ability to surpass human-level performance. While self-play offers a promising alternative, existing approaches depend on external verifiers or cannot learn open-endedly. We present Open-Ended Self-Improving Reasoner (OpenSIR), a self-play framework where an LLM learns to generate and solve novel problems by alternating teacher and student roles without external supervision. To generate novel problems, OpenSIR optimises for both difficulty and diversity, …

Similar Posts

Loading similar posts...