The next generation of RAG: How PageIndex improves retrieval accuracy without semantic search
7 min readJust now
–
Press enter or click to view image in full size
Reasoning-based RAG versus vector-based RAG. Image by the author
Retrieval-augmented generation (RAG) adds the external knowledge contained in a large collection of documents to an LLM. RAG uses optimized vector databases to efficiently store embedding vectors and find relevant matches to a given query.
Since OpenAI’s o1 model series, many LLMs are now capable of reasoning with an internal thought process. A brand-new generation of RAG utilizes reasoning to find relevant matches more closely resembling how humans search for information.
In this article, we will examine PageIndex, a vector-less, reasoning-based R…
The next generation of RAG: How PageIndex improves retrieval accuracy without semantic search
7 min readJust now
–
Press enter or click to view image in full size
Reasoning-based RAG versus vector-based RAG. Image by the author
Retrieval-augmented generation (RAG) adds the external knowledge contained in a large collection of documents to an LLM. RAG uses optimized vector databases to efficiently store embedding vectors and find relevant matches to a given query.
Since OpenAI’s o1 model series, many LLMs are now capable of reasoning with an internal thought process. A brand-new generation of RAG utilizes reasoning to find relevant matches more closely resembling how humans search for information.
In this article, we will examine PageIndex, a vector-less, reasoning-based RAG algorithm that primarily considers a document’s table of contents.
How Traditional Vector-Based RAG Works
The basic RAG system today consists of a collection of documents, an embedding model, a vector database for retrieval, and an LLM for generating answers.
The basic vector-based RAG architecture has an offline embedding phase and an online query processing phases. Image from [1]