SatoriDB: Billion scale embedded vector database

SatoriDB is an embedded vector database built for bigger than memory workloads for ANN search and handles billion scale datasets with 95%+ recall and predictable latencies.

Architecture

SatoriDB is a two-tier search system: a small "hot" index in RAM routes queries to "cold" vector data on disk. This lets us handle billion-scale datasets without holding everything in memory.

Routing (Hot Tier)

Quantized HNSW index over bucket centroids. Centroids are scalar-quantized (f32 → u8) so the whole routing index fits in RAM even at 500k+ buckets. When a query comes in, HNSW finds the top-K most relevant buckets in O(log N). We only search those, not the entire dataset.

Scanning (Cold Tier)

CPU-pinned Glommio wo…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help