Mitigating Self-Preference by Authorship Obfuscation
arxiv.org·6h
🔍BitFunnel
Preview
Report Post

Title:Mitigating Self-Preference by Authorship Obfuscation

View PDF HTML (experimental)

Abstract:Language models (LMs) judges are widely used to evaluate the quality of LM outputs. Despite many advantages, LM judges display concerning biases that can impair their integrity in evaluations. One such bias is self-preference: LM judges preferring their own answers over those produced by other LMs or humans. The bias is hard to eliminate as frontier LM judges can distinguish their own outputs from those of others, even when the evaluation candidates are not labeled with their sources. In this paper, we investigate strategies to mitigate self-preference by reducing the LM judges’ ability to recognize their …

Similar Posts

Loading similar posts...