Dependent Types, Proving Program Correctness, Curry-Howard Isomorphism, Formal Verification, Curry-Howard Correspondence, Hindley-Milner, Polymorphism
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
arxiv.org·2d
Loading...Loading more...