Locality Sensitive Hashing, Jaccard Similarity, Duplicate Detection, Document Clustering

Welcome to LIL’s Data.gov Archive Search
lil.law.harvard.edu·9h
💾Data Preservation
We Benchmarked DuckDB, SQLite, and Pandas on 1M Rows: Here’s What Happened
kdnuggets.com·15h
💾SQLite
Beyond Indexes: How Open Table Formats Optimize Query Performance
jack-vanlightly.com·2d·
🚀Query Optimization
Automated Anomaly Detection in Account Takeover via Multi-Modal Graph Neural Network Fusion
dev.to·57m·
Discuss: DEV
🔍Vector Forensics
Making GitHub Issues Search Suck Less with BigQuery, Vertex AI and CloudQuery
cloudquery.io·1d·
Discuss: Hacker News
🌐Federated Search
LINQ and Learning to Be Declarative
nickstambaugh.dev·1d·
Discuss: Hacker News
🔗Concatenative Programming
CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
arxiv.org·1d
🧮Vector Embeddings
The Bit Shift Paradox: How "Optimizing" Can Make Code 6× Slower
hackernoon.com·3d
🧮Compute Optimization
Experimenting with ACL2 and Claude Code
mikedodds.org·16h·
Discuss: Hacker News
👑Isabelle
Can AI Co-Design Distributed Systems? Scaling from 1 GPU to 1k
harvard-edge.github.io·7h·
Discuss: Hacker News
🎯Performance Proofs
I built a translator for spatial thinking (because I can't interview in Python)
graemefawcett.ca·10h·
Discuss: Hacker News
🔗Concatenative Programming
Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards
arxiv.org·1d
🧮Theorem Proving
LLM Optimization Notes: Memory, Compute and Inference Techniques
gaurigupta19.github.io·4d·
Discuss: Hacker News
💻Local LLMs
Exponential Error Bounds for Information Bottleneck Source Coding Problems
arxiv.org·1d
📐Compression Bounds
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
arxiv.org·1d
🧠Learned Codecs
Operationalizing Data Minimization for Privacy-Preserving LLM Prompting
arxiv.org·4d
💻Local LLMs
Activation Alchemist: Sculpting Stability with Functional Signatures
dev.to·9h·
Discuss: DEV
🔍Concolic Testing