A lexical view of contrast pairs in preference datasets (opens in new tab)

Can we spot differences between preference pairs just by looking at their word embeddings? In this blog post, I want to share my findings from examining lexical distances between chosen and rejected responses in preference datasets.