Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Multimodal AI
🖼️ Multimodal AI
vision-language models, VLM, image-text, multimodal learning
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
308
posts in
6.3
ms
OpenCV 5.0 Computer
Vision
Library Released with Rewritten DNN Engine
👁️
Computer Vision
linuxiac.com
·
3d
3 days ago
Actions for OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine
VL-DINO: Leveraging
CLIP
Vision-Language
Knowledge for Open-Vocabulary Object Detectio
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
20h
20 hours ago
Actions for VL-DINO: Leveraging CLIP Vision-Language Knowledge for Open-Vocabulary Object Detectio
itsperini/viscribe:
Image
intelligence layer for
AI
agents
🤖
AI Agents
Content type:
Code
github.com
·
7h
7 hours ago
·
Hacker News
Actions for itsperini/viscribe: Image intelligence layer for AI agents
Seeing Before Colliding: Anticipatory Safe RL with Frozen
Vision-Language
Models
🏷️
Label Noise
Content type:
Academic
arxiv.org
·
20h
20 hours ago
Actions for Seeing Before Colliding: Anticipatory Safe RL with Frozen Vision-Language Models
Reroute, Don't Remove: Recoverable
Visual
Token Routing for
Vision-Language
Models
🧠
LLMs
Content type:
Academic
arxiv.org
·
20h
20 hours ago
Actions for Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models
Task-Aligned Stability Analysis of
Vision-Language
Models
for Autonomous Driving Hazard Detection
🏷️
Label Noise
Content type:
Academic
arxiv.org
·
20h
20 hours ago
Actions for Task-Aligned Stability Analysis of Vision-Language Models for Autonomous Driving Hazard Detection
ChinaHeritaQA: A Culturally-Grounded
Visual
Question
Answering
Dataset for World Heritage Sites in China
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for ChinaHeritaQA: A Culturally-Grounded Visual Question Answering Dataset for World Heritage Sites in China
An Effective Router for
Vision-Language
Model
Selection
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for An Effective Router for Vision-Language Model Selection
NVIDIA/cosmos: NVIDIA Cosmos is an open platform of world
models
, datasets, and tools that enables developers to build Physical
AI
for robots, autonomous vehicles, smart infrastructure, and more.
🔌
MCP
Content type:
Code
github.com
·
6d
6 days ago
Actions for NVIDIA/cosmos: NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.
4DP-QA: Scalable QA for 4D Perception in
Vision
Language
Models
🏷️
Label Noise
Content type:
Academic
arxiv.org
·
20h
20 hours ago
Actions for 4DP-QA: Scalable QA for 4D Perception in Vision Language Models
AVIS: Adaptive Test-Time Scaling for
Vision-Language
Models
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
20h
20 hours ago
Actions for AVIS: Adaptive Test-Time Scaling for Vision-Language Models
AgenticNav:
Zero-Shot
Vision-and-Language
Navigation as a Tool-Calling Harness
🤖
AI Agents
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for AgenticNav: Zero-Shot Vision-and-Language Navigation as a Tool-Calling Harness
APT: Action Expert Pretraining Improves Instruction Generalization of
Vision-Language-Action
Policies
📊
OOD Detection
Content type:
Academic
arxiv.org
·
20h
20 hours ago
Actions for APT: Action Expert Pretraining Improves Instruction Generalization of Vision-Language-Action Policies
A Controlled Audit of Pretraining Contamination in Public Medical
Vision-Language
Benchmarks
🏷️
Label Noise
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks
Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
OpenMedReason: Scientific Reasoning Supervision for Medical
Vision-Language
Models
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
20h
20 hours ago
Actions for OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models
Textual
Supervision Enhances Geospatial Representations in
Vision-Language
Models
🧠
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Textual Supervision Enhances Geospatial Representations in Vision-Language Models
MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios
🏷️
Label Noise
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios
From Prompts to Tokens: Internalizing Causal Supervision in
Vision-Language
Model
for Multi-Image Causal Reasoning
🏷️
Label Noise
Content type:
Academic
arxiv.org
·
20h
20 hours ago
Actions for From Prompts to Tokens: Internalizing Causal Supervision in Vision-Language Model for Multi-Image Causal Reasoning
Are Reasoning
Vision-Language
Models
Robust to Semantic Visual Distractions?
🏷️
Label Noise
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help