📄 Text Chunking - matmat · Scour

Building a semantic search engine in ±250 lines of Python

bart.degoe.de·1d

🗂️Vector Search

PaddleOCR-VL-1.5: A 0.9B Vision-Language OCR Model Built for Real-World Documents

hackernoon.com·13h

🤖Advanced OCR

LLaDA2.1: Speeding Up Text Diffusion via Token Editing

arxiv.org·1d

📃Manuscript Tokenization

Large Language Models for Mortals book

andrewpwheeler.com·3h

The State of Agentic Graph RAG

localoptimumai.substack.com·1d·

Discuss: Substack

🧮Datalog Systems

Snippets With Regular Expressions

irreal.org·21h

🌳Incremental Parsing

Document Clustering with LLM Embeddings in Scikit-learn

machinelearningmastery.com·1d

🧮Vector Embeddings

Building a Regex Engine with a team of parallel Claudes

lesswrong.com·13h

🔍RegEx Engines

Geospatial-Reasoning-Driven Vocabulary-Agnostic Remote Sensing Semantic Segmentation

arxiv.org·1d

📄Semantic Chunking

Documenting automatic text transcription tools in catalogues/metadata/displays?

openobjects.org.uk·2d

Extract structured data from any website. Real-time search API with JSON, Markdown & HTML output.

searchresult.dev·1d·

Discuss: Hacker News

🕵️Feed Discovery

EdgeQuake: Rust-powered RAG framework for production knowledge graphs

news.ycombinator.com·1d·

Discuss: Hacker News

🦀Rust Borrowing

Towards a Standard for JSON Document Databases

muratbuffalo.blogspot.com·1d·

Discuss: Blogger

📊Graph Databases

'Tech bros are predicting the end of work as we know it thanks to AI, but struggle to envision what comes next'

lemonde.fr

·11h

🛡Cybersecurity

Boundary Issues

notes.billmill.org·19h

🔶Voronoi Diagrams

Semantic Design Tokens That Scale Across Platforms

hackernoon.com·7h

🔗Hypermedia APIs

Rethinking TXT Files

dataabinitio.com·1d

✅Format Validation

How I Cut My Google Search Dependence in Half

hister.org·23h·

Discuss: Lobsters, Hacker News

opendatalab/OmniDocBench: [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation

github.com·2d·

Discuss: Hacker News

👁️Constructive OCR

Webmentions with batteries included

blog.fabiomanganiello.com·46m·

Discuss: Lobsters, Hacker News

🔗Hypermedia APIs

Loading more...