How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs (opens in new tab) 🎮Reinforcement Learning Content type: Academic

arxiv.org··Covered by ai-brief.liziran.com·Open original

Token-level credit assignment remains a key obstacle for reinforcement learning (RL) in large language models (LLMs), where RL recipes typically treat all tokens equally, failing to distinguish decisive reasoning steps from routine formatting or fluent filler. Recent attempts leverage model-internal signals to assign finer-grained credit, but these are often point-wise heuristics that ignore the global structure of information propagation. We pr...

Read the original article

Sign in to keep reading the full article.

Sign Up Log In

Cited by 1 article

In other languages

一条证据压成1个token，生成省3-10倍

ai-brief.liziran.com·