LLMs and reinforcement learning
sicpers.info·1d
Active Confusion Expression in Large Language Models: Leveraging World Models toward Better Social Reasoning
arxiv.org·1d
LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation
arxiv.org·1d
Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization
arxiv.org·4d
Loading...Loading more...