Abstract page for arXiv paper 2305.18290: Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Press ? anytime to show this help