Document search using Claude and an inverted index.
annanay.dev·4d·
Discuss: Hacker News
🎯Vector Databases
Preview
Report Post

Soooo vector databases are pretty popular right now for document search, but it has some drawbacks: choosing the right embedding models, finding the right chunk size, indexing into high dimensional embeddings and managing specialized infrastructure.

This complexity might be unnecessary for some use cases that just have a bunch of docs in S3 and want to RAG over them.

I started with the thesis that LLM generated query sets combined with a good index can match vectorDB performance.

The idea

I decided to implement the simplest idea:

  1. I used an LLM (Claude Sonnet 4.5 in this case) to generate possible keywords that could be present in documents related to the search query. I played around with a few numbers until I landed on 10 query sets each containing 2 keywords. Looking a…

Similar Posts

Loading similar posts...