Chunking in RAG: why your splitter matters more than your embedding model (opens in new tab)

Covers Introducing Contextual RetrievalDiscussed on DEV

Most RAG retrieval problems I've debugged came down to the same thing: someone swapped the embedding model three times, added a reranker, then gave up — and never once changed the chunker. This is backwards. The chunker decides what your embedding model is allowed to see. A great embedding on a bad chunk is still a bad retrieval. And the published research from the last 18 months keeps pointing at the same conclusion: the "smart" chunking strategies don't beat a tuned dumb one. What does beat...

Read the original article