๐Ÿฟ๏ธ ScourBrowse
LoginSign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
๐Ÿ“„ Text Chunking

Semantic Segmentation, Context Windows, Document Boundaries, Retrieval Units

Why Your Chunking Strategy Makes or Breaks Your AI System
medium.comยท5dยท
Discuss: Hacker News
๐Ÿ“„Semantic Chunking
StoryGem: Voronoi treemap Approach for Semantics-Preserving Text Visualization
arxiv.orgยท1d
๐Ÿ”ถVoronoi Diagrams
davidchisnall/igk: I got Knuth'd: A compiler for documents
github.comยท17h
๐Ÿ“Concrete Syntax
Which Vision Language Models Should You Use for Your Apps
thenewstack.ioยท2d
๐Ÿค–Advanced OCR
Contextualizing SUTRA: Advancements in Multilingual & Efficient LLMs
hackernoon.comยท7h
๐Ÿ’ปLocal LLMs
Inverse text-sizing based on text-length with attr()
daverupert.comยท9h
๐Ÿ–‹Typography
Clustering News Articles for Topic Detection: A Technical Deep Dive
dev.toยท3dยท
Discuss: DEV
๐Ÿ“šDocument Clustering
How to Prove That An Email Was Received
metaspike.comยท4h
๐Ÿ“„Document Digitization
New: Improve Apache Iceberg query performance in Amazon S3 with sort and z-order compaction
aws.amazon.comยท1d
๐Ÿ”„Burrows-Wheeler
ByteSpan: Information-Driven Subword Tokenisation
arxiv.orgยท1d
๐Ÿ’พBinary Linguistics
June 25, 2025 Flight Tracking Workshop (4 hour) [Americas / Europe-friendly time]
bellingcat.comยท23h
๐ŸงฎProlog Parsing
JupyterLab-PKM 0.1.12
electricarchaeology.caยท5h
๐ŸŒ€Brotli Internals
Markov-Enhanced Clustering for Long Document Summarization: Tackling the 'Lost in the Middle' Challenge with Large Language Models
arxiv.orgยท1d
๐Ÿ“ฅFeed Aggregation
Agentic AI: Implementing Long-Term Memory
towardsdatascience.comยท1d
๐Ÿ’พPersistence Strategies
Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated
arxiv.orgยท19h
๐ŸงฎKolmogorov Complexity
Practical tips to optimize documentation for LLMs, AI agents, and chatbots
biel.aiยท1dยท
Discuss: Hacker News
๐Ÿค–Archive Automation
The modern text processing pipeline: Overview
newroadoldway.comยท2dยท
Discuss: Lobsters, r/programming
๐Ÿ”คUnicode Normalization
Portable Network Graphics (PNG) Specification (Third Edition)
w3.orgยท1dยท
Discuss: Hacker News
๐Ÿ•ธ๏ธWebP Analysis
Automattic/harper: Offline, privacy-first grammar checker. Fast, open-source, Rust-powered
github.comยท1d
๐Ÿ“Concrete Syntax
The Bitter Lesson is coming for Tokenization
lucalp.devยท1dยท
Discuss: Lobsters, Hacker News, r/programming
๐Ÿ”—Monadic Parsing
Loading...Loading more...
AboutBlogChangelogRoadmap