๐Ÿฟ๏ธ ScourBrowse
LoginSign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
๐Ÿ“„ Text Chunking

Semantic Segmentation, Context Windows, Document Boundaries, Retrieval Units

Why Your Chunking Strategy Makes or Breaks Your AI System
medium.comยท4dยท
Discuss: Hacker News
๐Ÿ“„Semantic Chunking
StoryGem: Voronoi treemap Approach for Semantics-Preserving Text Visualization
arxiv.orgยท1d
๐Ÿ”ถVoronoi Diagrams
davidchisnall/igk: I got Knuth'd: A compiler for documents
github.comยท10h
๐Ÿ“Concrete Syntax
Which Vision Language Models Should You Use for Your Apps
thenewstack.ioยท1d
๐Ÿค–Advanced OCR
Why Your Next LLM Might Not Have A Tokenizer
towardsdatascience.comยท21h
๐Ÿค–Grammar Induction
Inverse text-sizing based on text-length with attr()
daverupert.comยท2h
๐Ÿ–‹Typography
Clustering News Articles for Topic Detection: A Technical Deep Dive
dev.toยท3dยท
Discuss: DEV
๐Ÿ“šDocument Clustering
June 25, 2025 Flight Tracking Workshop (4 hour) [Americas / Europe-friendly time]
bellingcat.comยท16h
๐ŸงฎProlog Parsing
New: Improve Apache Iceberg query performance in Amazon S3 with sort and z-order compaction
aws.amazon.comยท20h
๐Ÿ”„Burrows-Wheeler
ByteSpan: Information-Driven Subword Tokenisation
arxiv.orgยท1d
๐Ÿ’พBinary Linguistics
Markov-Enhanced Clustering for Long Document Summarization: Tackling the 'Lost in the Middle' Challenge with Large Language Models
arxiv.orgยท1d
๐Ÿ“ฅFeed Aggregation
Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated
arxiv.orgยท12h
๐ŸงฎKolmogorov Complexity
Practical tips to optimize documentation for LLMs, AI agents, and chatbots
biel.aiยท21hยท
Discuss: Hacker News
๐Ÿค–Archive Automation
Portable Network Graphics (PNG) Specification (Third Edition)
w3.orgยท20hยท
Discuss: Hacker News
๐Ÿ•ธ๏ธWebP Analysis
PDF Retrieval Augmented Question Answering
arxiv.orgยท1d
๐Ÿ“ŠMulti-vector RAG
The modern text processing pipeline: Overview
newroadoldway.comยท1dยท
Discuss: Lobsters, r/programming
๐Ÿ”คUnicode Normalization
What LLMs Know About Their Users
schneier.comยท5hยท
Discuss: Hacker News
๐Ÿ’ปLocal LLMs
Automattic/harper: Offline, privacy-first grammar checker. Fast, open-source, Rust-powered
github.comยท1d
๐Ÿ“Concrete Syntax
How to sync Context across AI Assistants (ChatGPT, Claude, Perplexity...) in your browser
dev.toยท1dยท
Discuss: DEV
๐Ÿ–ฅ๏ธModern Terminals
BPCLIP: A Bottom-up Image Quality Assessment from Distortion to Semantics Based on CLIP
arxiv.orgยท1d
๐Ÿ–ผ๏ธJPEG XL
Loading...Loading more...
AboutBlogChangelogRoadmap