Chunking in RAG: why your splitter matters more than your embedding model (opens in new tab)
Most RAG retrieval problems I've debugged came down to the same thing: someone swapped the embedding model three times, added a reranker, then gave up — and never once changed the chunker. This is backwards. The chunker decides what your embedding model is allowed to see. A great embedding on a bad chunk is still a bad retrieval. And the published research from the last 18 months keeps pointing at the same conclusion: the "smart" chunking strategies don't beat a tuned dumb one. What does beat...
Read the original article