๐Ÿฟ๏ธ ScourBrowse
LoginSign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
๐Ÿ“„ Semantic Chunking

Document Segmentation, Context Windows, Text Boundaries, Retrieval Units

Why Your Chunking Strategy Makes or Breaks Your AI System
medium.comยท4dยท
Discuss: Hacker News
๐Ÿ“„Text Chunking
ByteSpan: Information-Driven Subword Tokenisation
arxiv.orgยท1d
๐Ÿ’พBinary Linguistics
davidchisnall/igk: I got Knuth'd: A compiler for documents
github.comยท12h
๐Ÿ“Concrete Syntax
Machine Learning Fundamentals: active learning
dev.toยท1dยท
Discuss: DEV
๐Ÿค–Grammar Induction
Which Vision Language Models Should You Use for Your Apps
thenewstack.ioยท2d
๐Ÿค–Advanced OCR
Markov-Enhanced Clustering for Long Document Summarization: Tackling the 'Lost in the Middle' Challenge with Large Language Models
arxiv.orgยท1d
๐Ÿ“„Text Chunking
New: Improve Apache Iceberg query performance in Amazon S3 with sort and z-order compaction
aws.amazon.comยท22h
๐Ÿ”„Burrows-Wheeler
Agentic AI: Implementing Long-Term Memory
towardsdatascience.comยท22h
๐Ÿ’พPersistence Strategies
PDF Retrieval Augmented Question Answering
arxiv.orgยท1d
๐Ÿ“ŠMulti-vector RAG
Contextualizing SUTRA: Advancements in Multilingual & Efficient LLMs
hackernoon.comยท2h
๐Ÿ’ปLocal LLMs
Could Open Table Formats End the Reign of Snowflake and Databricks?
prequel.coยท50mยท
Discuss: Hacker News
๐Ÿ“šMARC Evolution
Portable Network Graphics (PNG) Specification (Third Edition)
w3.orgยท21hยท
Discuss: Hacker News
๐Ÿ•ธ๏ธWebP Analysis
The Bitter Lesson is coming for Tokenization
lucalp.devยท1dยท
Discuss: Lobsters, Hacker News, r/programming
๐Ÿ”—Monadic Parsing
V2T-CoT: From Vision to Text Chain-of-Thought for Medical Reasoning and Diagnosis
arxiv.orgยท14h
๐Ÿค–Advanced OCR
StoryGem: Voronoi treemap Approach for Semantics-Preserving Text Visualization
arxiv.orgยท1d
๐Ÿ”ถVoronoi Diagrams
Clustering News Articles for Topic Detection: A Technical Deep Dive
dev.toยท3dยท
Discuss: DEV
๐Ÿ“šDocument Clustering
Semantic-Aware Parsing for Security Logs
arxiv.orgยท1d
๐Ÿ“Log Parsing
Using an LLM for query planning in RAG โ€“> 40% better answer relevance
techcommunity.microsoft.comยท22hยท
Discuss: Hacker News
๐Ÿ”Information Retrieval
BPCLIP: A Bottom-up Image Quality Assessment from Distortion to Semantics Based on CLIP
arxiv.orgยท1d
๐Ÿ–ผ๏ธJPEG XL
Automattic/harper: Offline, privacy-first grammar checker. Fast, open-source, Rust-powered
github.comยท1d
๐Ÿ“Concrete Syntax
Loading...Loading more...
AboutBlogChangelogRoadmap