Semantic Segmentation, Context Windows, Document Boundaries, Retrieval Units

RAG Chunking Strategies That Actually Work (and Why Most Don’t)
dev.to·8h·
Discuss: DEV
🧪Archive Fuzzing
From Segments to Concepts: Interpretable Image Classification via Concept-Guided Segmentation
arxiv.org·13h
🧠Machine Learning
Eliminating the Precision–Latency Trade-Off in Large-Scale RAG
thenewstack.io·4d
🎯Retrieval Systems
An alternative to knowledge graphs for storing loosely structured content
fleetingswallow.com·2d·
Discuss: Hacker News
🕸️Knowledge Graphs
Algorithmic Archive Project: Use Cases (1/3)
blogs.bodleian.ox.ac.uk·7h
📊Citation Graphs
Detecting Semantic Clones of Unseen Functionality
arxiv.org·13h
🔗Binary Similarity
The Legacy Code Survival Guide: Add Features Without Fear
understandlegacycode.com·7h
🔓Decompilation
Show HN: Sweep, AI autocomplete for JetBrains that rewrites code
sweep.dev·53m·
Discuss: Hacker News
🌳Incremental Parsing
Teaching Models to Decide When to Retrieve: Adaptive RAG, Part 4
blog.reachsumit.com·1d·
Discuss: Hacker News
🧠Learned Indexing
Latency vs. Accuracy for LLM Apps — How to Choose and How a Memory Layer Lets You Win Both
dev.to·6h·
Discuss: DEV
Performance Mythology
LLM Optimization Notes: Memory, Compute and Inference Techniques
gaurigupta19.github.io·1d·
Discuss: Hacker News
💻Local LLMs
Automating construction safety inspections using a multi-modal vision-language RAG framework
arxiv.org·13h
🤖Advanced OCR
Tritium | Thoughts on the Word Spec in Rust
tritium.legal·1d·
🦀Rust Macros
GPT-5-Codex is a better AI researcher than me
seangoedecke.com·17h·
Discuss: Hacker News
🧠Intelligence Compression
Paper2Video: Automatic Video Generation from Scientific Papers
arxiv.org·13h
📊Document Wavelets
Detecting Distillation Data from Reasoning Models
arxiv.org·13h
⚙️ABNF Mining
Memory leaks: the forgotten side of web performance (2022)
nolanlawson.com·38m·
Discuss: Hacker News
🧠Memory Forensics
Defining a Standard Taxonomy for Segmentation
blogs.cisco.com·5h
🎯Threat Hunting
SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling
arxiv.org·13h
🧮Kolmogorov Complexity
ALHD: A Large-Scale and Multigenre Benchmark Dataset for Arabic LLM-Generated Text Detection
arxiv.org·13h
📝Text Parsing