Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems (opens in new tab) 🎲Recommendation Systems Content type: Academic

arxiv.org··Cited by 1 article·Open original

Direct preference optimization (DPO) is a simple and effective alignment strategy for large language models (LLMs) based on pairwise preferences. In recommender systems, however, user feedback is rarely pairwise. For a given context, e.g., a user, a session, or a conversation, we typically observe set-wise preferences with multiple positive items, where every positive item should outrank every unobserved or explicitly negative item, with no pres...

Read the original article

Sign in to keep reading the full article.

Sign Up Log In

Cited by 1 article

The week AI infrastructure crossed from a technology story to a financial one

mlwhiz.com·