Document Processing, Neural OCR, Multilingual Archives, Computational Philology

Indonesia’s film industry embraces AI to make Hollywood-style movies for cheap
restofworld.org·1d·
Discuss: Hacker News
🧠Learned Codecs
The artificial complexity of OOXML files (the PPTX case)
blog.documentfoundation.org·1d·
Discuss: Hacker News
📟Terminal Typography
​​Speech-to-Retrieval (S2R): A new approach to voice search
research.google·4d·
Discuss: Hacker News
🎙️Whisper
Can an LLM Be a Black-Box Optimizer?
posgeo.wordpress.com·16h·
Discuss: Hacker News
🧮Kolmogorov Bounds
2025-10-10 # LLMs Are Transpilers
alloc.dev·2d·
Discuss: Hacker News
🔄Language Evolution
Kurzgesagt - In a Nutshell: AI Slop Is Destroying The Internet
dev.to·4h·
Discuss: DEV
🗜️LZW Variants
Aligning Large Language Models via Fully Self-Synthetic Data
arxiv.org·2d
🔗Monadic Parsing
How Google Translate & ChatGPT Work: The Transformer, Unboxed
dev.to·2d·
Discuss: DEV
🧠Learned Codecs
High-Throughput Reactive Sputtering Process Optimization via Adaptive Machine Learning Control
dev.to·9h·
Discuss: DEV
📄Document Digitization
Machines in the Crowd? Measuring the Footprint of Machine-Generated Text on Reddit
arxiv.org·2d
🏛Digital humanities
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
arxiv.org·2d
🌀Differential Geometry
Krish Naik: Complete RAG Crash Course With Langchain In 2 Hours
dev.to·1d·
Discuss: DEV
📊Multi-vector RAG
The Custom Conveyor: Building Your Own Iterators
dev.to·20h·
Discuss: DEV
🔄Burrows-Wheeler
As precise as the railway is on time: German companies' demands on AI systems
heise.de·11h
👁️Observatory Systems
Contrastive Weak-to-strong Generalization
arxiv.org·1d
Information Bottleneck
Beyond Vector Search: Building a RAG That *Actually* Understands Your Data
dev.to·2d·
Discuss: DEV
🗂️Vector Databases
Expanding the Action Space of LLMs to Reason Beyond Language
arxiv.org·1d
💻Local LLMs
Causality Guided Representation Learning for Cross-Style Hate Speech Detection
arxiv.org·1d
🎙️Whisper
Krish Naik: Complete RAG Crash Course With Langchain In 2 Hours
dev.to·14h·
Discuss: DEV
📊Multi-vector RAG