Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation (opens in new tab) 📈Search Quality Content type: Academic

arxiv.org··Cited by 1 article·Open original

Reinforcement learning (RL) presents a promising avenue for enhancing generative recommendation beyond supervised imitation, leveraging reward signals to guide policy improvement. However, its efficacy is critically contingent on the trustworthiness of the reward model for the samples it evaluates. In practice, production rankers, the widely adopted reward models, are trained on exposure-biased logs, leading to sample-dependent inaccuracies th...

Read the original article

Sign in to keep reading the full article.

Sign Up Log In

Cited by 1 article

In other languages

AI 연구진, LLM 방법론으로 추천 시스템 고도화

kite.kagi.com·