Convomem Benchmark: Why Your First 150 Conversations Don't Need RAG
arxiv.org·14h
Flag this post

View PDF HTML (experimental)

Abstract:We introduce a comprehensive benchmark for conversational memory evaluation containing 75,336 question-answer pairs across diverse categories including user facts, assistant recall, abstention, preferences, temporal changes, and implicit connections. While existing benchmarks have advanced the field, our work addresses fundamental challenges in statistical power, data generation consistency, and evaluation flexibility that limit current memory evaluation frameworks. We examine the relationship between conversational memory and retrieval-augmented generation (RAG). While these systems share fundamental architectural patterns–temporal reasoning, implicit extraction, kno…

Similar Posts

Loading similar posts...