Multimodal AI

Feeds to Scour
SubscribedAll
Scoured 187 posts in 20.3 ms

An Effective Router for Vision-Language Model Selection

 Edge AI  Content type: Academic
arxiv.org·

What TTS Throws Away

 Gemini
amaldavid.com··Hacker News

linzhiqiu/t2v_metrics: Evaluating text-to-image/video/3D models with VQAScore

 📋Text Quality  Content type: Code
github.com··Hacker News

OpenCV 5.0 Released With Rewritten DNN Engine, Built-In LLM & VLM Support

 🧠Machine Learning
phoronix.com··Hacker News

know the mother tongue of your LLMs

 🇨🇳Chinese AI

Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation

 📋Text Quality  Content type: Academic
arxiv.org·

A Visual Guide to Gemma 4 12B

 🤖AI

Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?

 🛡️AI Safety  Content type: Academic
arxiv.org·

vishal-dehurdle/state-harness: Runtime safety net for LLM agents. Detects token spirals, kills doomed tasks early, tells you exactly why. Rust core, Python SDK. pip install state-harness

 🤖AI  Content type: Code
github.com··Hacker News

A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks

 🦉Qwen  Content type: Academic
arxiv.org·

OpenCV 5 Is Here: The Biggest Leap in Years for Computer Vision

 🧠Machine Learning

One Stone, Three Birds: Self-adaptive Optimal Transport for Multi-VLM Selection, Adaptation, and Ensembling

 Edge AI  Content type: Academic
arxiv.org·

AIchain Skill: A Prompt as a Reusable Object

 🤖AI  Content type: Code
github.com··DEV

Launch HN: Transload (YC P26) – Measuring freight items with CCTV

 👁️Surveillance Capitalism  Content type: Discussion

Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data

 Gemini  Content type: Academic
arxiv.org·

Slow Token, Fast Action – Learning in Robotics

 🧠Machine Learning  Content type: Blog

One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA

 🧠Agent Memory  Content type: Academic
arxiv.org·

MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models

 🕳LLM Vulnerabilities  Content type: Academic
arxiv.org·

Lean Inference Workflows: Applying "Lean" Concepts To Building AI Agents

 🎭Claude  Content type: Blog

MotionEnhancer: Leveraging Video Diffusion for Motion-Enhanced Vision-Language Models

 Edge AI  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help