Bloom filters are good for search that does not scale
notpeerreviewed.com·11h·
Discuss: r/programming
Flag this post

A great blog post from 2013 describes using bloom filters to build a space-efficient full text search index for small numbers of documents. The algorithm is simple: Per document, create a bloom filter of all its words. To query, simply check each document’s bloom filter for the query terms.

With a query time complexity of O(number-of-documents), we can forget about using this on big corpuses, right? In this blog post I propose a way of scaling the technique to large document corpuses (e.g. the web) and discuss why that is a bad idea.

Fun fact: There is a nice implementation of this exact algorithm that is still used in the wild. But let’s get into it.

Why even try thi…

Similar Posts

Loading similar posts...