RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
arxiv.org·3d
Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based Methods
arxiv.org·2d
Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing
arxiv.org·4d
Can You Trust an LLM with Your Life-Changing Decision? An Investigation into AI High-Stakes Responses
arxiv.org·4d
CIMR: Contextualized Iterative Multimodal Reasoning for Robust Instruction Following in LVLMs
arxiv.org·3d
The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models
arxiv.org·2d
Probabilistic Consistency in Machine Learning and Its Connection to Uncertainty Quantification
arxiv.org·4d
Loading...Loading more...