Multimodal AI

Feeds to Scour
SubscribedAll
Scoured 308 posts in 6.3 ms

OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine

 👁️Computer Vision
linuxiac.com·

VL-DINO: Leveraging CLIP Vision-Language Knowledge for Open-Vocabulary Object Detectio

 👁️Computer Vision  Content type: Academic
arxiv.org·

itsperini/viscribe: Image intelligence layer for AI agents

 🤖AI Agents  Content type: Code
github.com··Hacker News

Seeing Before Colliding: Anticipatory Safe RL with Frozen Vision-Language Models

 🏷️Label Noise  Content type: Academic
arxiv.org·

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

 🧠LLMs  Content type: Academic
arxiv.org·

Task-Aligned Stability Analysis of Vision-Language Models for Autonomous Driving Hazard Detection

 🏷️Label Noise  Content type: Academic
arxiv.org·

ChinaHeritaQA: A Culturally-Grounded Visual Question Answering Dataset for World Heritage Sites in China

 🧠LLMs  Content type: Academic
arxiv.org·

An Effective Router for Vision-Language Model Selection

 🧠LLMs  Content type: Academic
arxiv.org·

NVIDIA/cosmos: NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.

 🔌MCP  Content type: Code
github.com·

4DP-QA: Scalable QA for 4D Perception in Vision Language Models

 🏷️Label Noise  Content type: Academic
arxiv.org·

AVIS: Adaptive Test-Time Scaling for Vision-Language Models

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

AgenticNav: Zero-Shot Vision-and-Language Navigation as a Tool-Calling Harness

 🤖AI Agents  Content type: Academic
arxiv.org·

APT: Action Expert Pretraining Improves Instruction Generalization of Vision-Language-Action Policies

 📊OOD Detection  Content type: Academic
arxiv.org·

A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks

 🏷️Label Noise  Content type: Academic
arxiv.org·

Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

Textual Supervision Enhances Geospatial Representations in Vision-Language Models

 🧠LLMs  Content type: Academic
arxiv.org·

MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios

 🏷️Label Noise  Content type: Academic
arxiv.org·

From Prompts to Tokens: Internalizing Causal Supervision in Vision-Language Model for Multi-Image Causal Reasoning

 🏷️Label Noise  Content type: Academic
arxiv.org·

Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?

 🏷️Label Noise  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help