Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems (opens in new tab) 聽馃幉Recommendation Systems 聽Content type: Academic
Direct preference optimization (DPO) is a simple and effective alignment strategy for large language models (LLMs) based on pairwise preferences. In recommender systems, however, user feedback is rarely pairwise. For a given context, e.g., a user, a session, or a conversation, we typically observe set-wise preferences with multiple positive items, where every positive item should outrank every unobserved or explicitly negative item, with no pres...
Read the original article