LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation
arxiv.org·1h
🔍Information Retrieval
Preview
Report Post

View PDF HTML (experimental)

Abstract:Recent studies have demonstrated that large language models (LLMs) exhibit significant biases in evaluation tasks, particularly in preferentially rating and favoring self-generated content. However, the extent to which this bias manifests in fact-oriented tasks, especially within retrieval-augmented generation (RAG) frameworks, where keyword extraction and factual accuracy take precedence over stylistic elements, remains unclear. Our study addresses this knowledge gap by simulating two critical phases of the RAG framework. In the first phase, LLMs evaluated human-authored and model-generated passages, emulating the \textit{pointwise reranking phase}. The second ph…

Similar Posts

Loading similar posts...