Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference Settings (opens in new tab)

Aligning language models with human preferences often requires optimising multiple behavioural objectives. A practical approach is to apply these objectives sequentially using preference optimisation methods such as Direct Preference Optimisation (DPO), but it remains unclear whether later training uniformly degrades preferences learned earlier or whether the effect depends on the relationship between objectives. We study sequential DPO across...

Read the original article