๐Ÿฟ๏ธ ScourBrowse
LoginSign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
๐Ÿ“„ Semantic Chunking

Document Segmentation, Context Windows, Text Boundaries, Retrieval Units

Why Your Chunking Strategy Makes or Breaks Your AI System
medium.comยท5dยท
Discuss: Hacker News
๐Ÿ“„Text Chunking
ByteSpan: Information-Driven Subword Tokenisation
arxiv.orgยท1d
๐Ÿ’พBinary Linguistics
davidchisnall/igk: I got Knuth'd: A compiler for documents
github.comยท16h
๐Ÿ“Concrete Syntax
Machine Learning Fundamentals: active learning
dev.toยท1dยท
Discuss: DEV
๐Ÿค–Grammar Induction
Which Vision Language Models Should You Use for Your Apps
thenewstack.ioยท2d
๐Ÿค–Advanced OCR
Markov-Enhanced Clustering for Long Document Summarization: Tackling the 'Lost in the Middle' Challenge with Large Language Models
arxiv.orgยท1d
๐Ÿ“„Text Chunking
New: Improve Apache Iceberg query performance in Amazon S3 with sort and z-order compaction
aws.amazon.comยท1d
๐Ÿ”„Burrows-Wheeler
Agentic AI: Implementing Long-Term Memory
towardsdatascience.comยท1d
๐Ÿ’พPersistence Strategies
PDF Retrieval Augmented Question Answering
arxiv.orgยท1d
๐Ÿ“ŠMulti-vector RAG
Contextualizing SUTRA: Advancements in Multilingual & Efficient LLMs
hackernoon.comยท7h
๐Ÿ’ปLocal LLMs
Could Open Table Formats End the Reign of Snowflake and Databricks?
prequel.coยท5hยท
Discuss: Hacker News
๐Ÿ“šMARC Evolution
The Bitter Lesson is coming for Tokenization
lucalp.devยท1dยท
Discuss: Lobsters, Hacker News, r/programming
๐Ÿ”—Monadic Parsing
V2T-CoT: From Vision to Text Chain-of-Thought for Medical Reasoning and Diagnosis
arxiv.orgยท19h
๐Ÿค–Advanced OCR
Clustering News Articles for Topic Detection: A Technical Deep Dive
dev.toยท3dยท
Discuss: DEV
๐Ÿ“šDocument Clustering
StoryGem: Voronoi treemap Approach for Semantics-Preserving Text Visualization
arxiv.orgยท1d
๐Ÿ”ถVoronoi Diagrams
Semantic-Aware Parsing for Security Logs
arxiv.orgยท1d
๐Ÿ“Log Parsing
Using an LLM for query planning in RAG โ€“> 40% better answer relevance
techcommunity.microsoft.comยท1dยท
Discuss: Hacker News
๐Ÿ”Information Retrieval
BPCLIP: A Bottom-up Image Quality Assessment from Distortion to Semantics Based on CLIP
arxiv.orgยท1d
๐Ÿ–ผ๏ธJPEG XL
Automattic/harper: Offline, privacy-first grammar checker. Fast, open-source, Rust-powered
github.comยท1d
๐Ÿ“Concrete Syntax
Text2Struct: A Machine Learning Pipeline for Mining Structured Data from Text
arxiv.orgยท1d
๐Ÿ”คCharacter Classification
Loading...Loading more...
AboutBlogChangelogRoadmap