Locality Sensitive Hashing, Jaccard Similarity, Duplicate Detection, Document Clustering

Efficient and accurate search in petabase-scale sequence repositories
nature.comยท2dยท
Discuss: Hacker News
๐Ÿ”„Burrows-Wheeler
Sorting encrypted data without decryption: a practical trick
dev.toยท13hยท
Discuss: DEV
๐Ÿ”Hash Functions
An enough week
blog.mitrichev.chยท1dยท
๐Ÿ“ˆLinear programming
Nearest Neighbor CCP-Based Molecular Sequence Analysis
arxiv.orgยท1d
๐Ÿ”„Burrows-Wheeler
DupeGuru lets you quickly find and remove duplicate files from your drives
techspot.comยท1d
๐Ÿ”„Content Deduplication
YouTube gets ~5% CTR lift on Shorts by replacing embedding tables with Semantic IDs
shaped.aiยท1d
๐Ÿ“ŠFeed Optimization
Homomorphism Problems in Graph Databases and Automatic Structures
arxiv.orgยท1d
๐Ÿ”—Graph Isomorphism
[R] DeepSeek 3.2's sparse attention mechanism
reddit.comยท1dยท
๐ŸŒ€Brotli Internals
Creating Real-Time Multimodal AI Pipelines: Scaling File Processing to 50M Daily Uploads
engineering.salesforce.comยท5h
๐ŸŒŠStream Processing
Automated Spectral Fingerprint Deconvolution for Polymer Identification via Deep Oligomer Networks
dev.toยท4hยท
Discuss: DEV
๐ŸŒˆSpectroscopy
Explicit Lossless Vertex Expanders!
gilkalai.wordpress.comยท19h
๐Ÿ’ŽInformation Crystallography
Indexing, Hashing
dev.toยท1dยท
Discuss: DEV
๐Ÿš€Query Optimization
Mind the Gap: Quantifying Vocabulary Mismatch in E-Commerce Site Search
searchhub.ioยท1dยท
Discuss: Hacker News
๐Ÿ“ˆSearch Quality
A gentle introduction to Generative AI: Historical perspective
medium.comยท4hยท
Discuss: Hacker News
๐Ÿง Learned Codecs
Contrastive Weak-to-strong Generalization
arxiv.orgยท1d
โง—Information Bottleneck
MetaGraph: Scalable annotated de Bruijn graphs for DNA indexing and alignment
github.comยท1dยท
Discuss: Hacker News
๐Ÿ”„Burrows-Wheeler
Show HN: Rebuilt Bible search app to run 100% client-side with Transformers.js
biblos.appยท7hยท
Discuss: Hacker News
๐Ÿ“œBinary Philology
Fast-Convergent Proximity Graphs for Approximate Nearest Neighbor Search
arxiv.orgยท3d
๐Ÿ“Range Queries
My First Week of Vibecoding
underreacted.leaflet.pubยท2hยท
Discuss: Hacker News
๐ŸŽฏGradual Typing
Automated Copyright Infringement Detection via Semantic Fingerprinting and Dynamic Thresholding
dev.toยท2dยท
Discuss: DEV
๐Ÿ‘๏ธPerceptual Hashing