When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents
arxiv.org·1d
🔲Cellular Automata
Preview
Report Post

View PDF HTML (experimental)

Abstract:Supervised fine-tuning (SFT) has emerged as one of the most effective ways to improve the performance of large language models (LLMs) in downstream tasks. However, SFT can have difficulty generalizing when the underlying data distribution changes, even when the new data does not fall completely outside the training domain. Recent reasoning-focused models such as o1 and R1 have demonstrated consistent gains over their non-reasoning counterparts, highlighting the importance of reasoning for improved generalization and reliability. However, collecting high-quality reasoning traces for SFT remains challenging – annotations are costly, subjective, and difficult to scale...

Similar Posts

Loading similar posts...