OCR vs ADE: Mechanisms Behind the Methods
dev.toยท19hยท
Discuss: DEV
๐Ÿ“„OCR
CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
arxiv.orgยท41m
๐ŸงฎVector Embeddings
RND1: Simple, Scalable AR-to-Diffusion Conversion
radicalnumerics.aiยท8hยท
Discuss: Hacker News
๐Ÿ’ปLocal LLMs
Unlocking Image Understanding: A New Path to Visual AI for Everyone
dev.toยท10hยท
Discuss: DEV
๐Ÿค–AI Paleography
Work in content? You should be using AI for alt text
tk.ggยท5hยท
Discuss: Hacker News
๐Ÿ“„PostScript
Is Architectural Complexity Always the Answer? A Case Study on SwinIR vs. an Efficient CNN
arxiv.orgยท41m
โง—Information Bottleneck
Guide to OCI AI Certification: From Machine Learning Basics to Advanced Neural Networks
dev.toยท1dยท
Discuss: DEV
๐Ÿ“„Document AI
IASC: Interactive Agentic System for ConLangs
arxiv.orgยท41m
๐ŸŒณContext free grammars
To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models
arxiv.orgยท41m
๐Ÿ“ŠLearned Metrics
Hunyuan Image 3.0 โ€“ AI Image Generator (Text-to-Image)
hunyuanimage.onlineยท20hยท
Discuss: Hacker News
๐Ÿ“ธPNG Optimization
Show HN: Lore Engine โ€“ Turn 10-hour lectures into 2 hours of comprehensive notes
github.comยท8hยท
Discuss: Hacker News
๐Ÿ“„Document Streaming
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
arxiv.orgยท1d
๐ŸŒ€Differential Geometry
The key to conversational speech recognition
datasciencecentral.comยท10h
๐ŸŽตAudio ML
Evaluating OCR performance on food packaging labels in South Africa
arxiv.orgยท3d
๐Ÿ“„OCR
SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference
arxiv.orgยท41m
๐Ÿ’ปLocal LLMs
How the Rise of Tabular Foundation Models Is Reshaping Data Science
towardsdatascience.comยท15h
๐Ÿง Machine Learning
TALENT: Table VQA via Augmented Language-Enhanced Natural-text Transcription
arxiv.orgยท1d
๐ŸŽ™๏ธWhisper
Addressing the ID-Matching Challenge in Long Video Captioning
arxiv.orgยท1d
๐Ÿ“Vector Similarity
Evaluating Small Vision-Language Models on Distance-Dependent Traffic Perception
arxiv.orgยท41m
๐Ÿง Machine Learning
Curriculum Learning with Synthetic Data for Enhanced Pulmonary Nodule Detection in Chest Radiographs
arxiv.orgยท41m
๐Ÿ‘๏ธOCR Enhancement