Retrieval Augmented Generation (RAG) is often associated with vector search. And while that is a primary use case, any search will do.This article will go over a few RAG examples covering different retrieval methods. These examples require txtai 9.3+.Install and all dependencies.pip install txtai[pipeline-data] # Download example SQL database wget https://huggingface.co/NeuML/txtai-wikipedia-slim/resolve/main/documents The first example will cover RAG with ColBERT / Late Interaction retrieval. TxtAI 9.0 added support for MUVERA and ColBERT multi-vector ranking. We’ll build a pipeline that reads the ColBERT v2 paper, extracts the text into sections and builds an index with a ColBERT model. Then we’ll wrap that as a Reranker pipeline using the same ColBERT model. Finally a RAG pipeline will …
Retrieval Augmented Generation (RAG) is often associated with vector search. And while that is a primary use case, any search will do.This article will go over a few RAG examples covering different retrieval methods. These examples require txtai 9.3+.Install and all dependencies.pip install txtai[pipeline-data] # Download example SQL database wget https://huggingface.co/NeuML/txtai-wikipedia-slim/resolve/main/documents The first example will cover RAG with ColBERT / Late Interaction retrieval. TxtAI 9.0 added support for MUVERA and ColBERT multi-vector ranking. We’ll build a pipeline that reads the ColBERT v2 paper, extracts the text into sections and builds an index with a ColBERT model. Then we’ll wrap that as a Reranker pipeline using the same ColBERT model. Finally a RAG pipeline will utilize this for retrieval.Note: This uses the custom ColBERT Muvera Nano model which is only 970K parameters! That’s right thousands. It’s surprisingly effective.This paper introduces ColBERTv2, a neural information retrieval model that enhances the quality and efficiency of late interaction by combining an aggressive residual compression mechanism with a denoised supervision strategy, achieving state-of-the-art performance across diverse benchmarks while reducing the model’s space footprint by 6–10× compared to previous methods. Next we’ll run a RAG pipeline using a web search as the retrieval method.Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It involves technologies like machine learning, deep learning, and natural language processing, and enables machines to simulate human-like learning, comprehension, problem solving, decision-making, creativity, and autonomy. The last example we’ll cover is running RAG with a SQL query. We’ll use the SQL database that’s a component of the txtai-wikipedia-slim embeddings database.Since this is just a database with Wikipedia abstracts, we’ll need a way to build a SQL query from a search query. For that we’ll use an LLM to extract a keyword to use in a clause.Given that the LLM used was released in August 2025, let’s ask it a question that can only be accurated answered with external data. Who won the 2025 World Series? which ended in November.In the 2025 World Series, the Los Angeles Dodgers defeated the Toronto Blue Jays in seven games to win the championship. The series took place from October 24 to November 1 (ending early on November 2, Toronto time). Dodgers pitcher Yoshinobu Yamamoto was named the World Series MVP. The series was televised by Fox in the United States and by Sportsnet in Canada. This article showed that RAG is about much more than vector search. With txtai 9.3+, any callable method is now supported for retrieval. Enjoy!