Your RAG faithfulness check is measuring copy-paste, not faithfulness (opens in new tab)
I was building an eval harness for a retrieval-augmented generation pipeline, and the first faithfulness check I wrote was quietly wrong. It looked reasonable. It ran on every example for free. It just measured the wrong thing, and I only saw it once I started feeding it edge cases on purpose. The way it fails is the same way most RAG tutorials I had copied from fail. Here is the one sentence version: token overlap does not measure faithfulness, it measures copy-paste fidelity, and the gap be...
Read the original article