Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Multimodal AI
👁️ Multimodal AI
Vision-Language, Image-Text, Multimodal Models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
459
posts in
7.6
ms
VL-DINO
: Leveraging
CLIP
Vision-Language
Knowledge for Open-Vocabulary Object Detectio
🎨
Generative AI
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for VL-DINO: Leveraging CLIP Vision-Language Knowledge for Open-Vocabulary Object Detectio
From Prompts to Tokens: Internalizing Causal Supervision in
Vision-Language
Model
for Multi-Image Causal Reasoning
🎨
Generative AI
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for From Prompts to Tokens: Internalizing Causal Supervision in Vision-Language Model for Multi-Image Causal Reasoning
Decoding Pedestrian Crossing Intention from Egocentric
Vision
via
Vision
Language
Models
🤖
Large Language Models
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models
Diagnosing
Visual
Ignorance in
Vision-Language
Models
🧠
LLM
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Diagnosing Visual Ignorance in Vision-Language Models
The Last Visible Pixel: Probing Fine-Scale Perception in
Vision-Language
Models
🧠
LLM
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for The Last Visible Pixel: Probing Fine-Scale Perception in Vision-Language Models
Vision
Language
Model
Helps Private Information De-Identification in
Vision
Data
🎨
Generative AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Vision Language Model Helps Private Information De-Identification in Vision Data
Almieyar-Oryx-BloomBench: A Bilingual
Multimodal
Benchmark for Cognitively Informed Evaluation of
Vision-Language
Models
🎨
Generative AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models
OmniGameArena: A Unified UE5 Benchmark for
VLM
Game Agents with Improvement Dynamics
🤖
AI Agents
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics
EasyLens: A Training-Free Plug-and-Play Subtle-Lesion Representation Amplifier for Medical
Vision-Language
Models
🎨
Generative AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for EasyLens: A Training-Free Plug-and-Play Subtle-Lesion Representation Amplifier for Medical Vision-Language Models
AgenticNav: Zero-Shot
Vision-and-Language
Navigation as a Tool-Calling Harness
🎨
Generative AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for AgenticNav: Zero-Shot Vision-and-Language Navigation as a Tool-Calling Harness
Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
💬
Prompt Engineering
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
PlanBench-V: A Spatial Planning Map Benchmark for
Vision-Language
Models
🤖
ChatGPT
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for PlanBench-V: A Spatial Planning Map Benchmark for Vision-Language Models
CLASP:
Language-Driven
Robot Skill Selection and Composition using Task-Parameterized Learning
💬
Prompt Engineering
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning
Adversarial Attacks Already Tell the
Answer
: Directional Bias-Guided Test-time Defense for
Vision-Language
Models
🎨
Generative AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models
A Dataset for Dynamic Human Preferences for
Vision
Language
Models
🧠
LLM
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for A Dataset for Dynamic Human Preferences for Vision Language Models
Harnessing Streaming Video in the Wild
🔍
Feed Discovery
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Harnessing Streaming Video in the Wild
Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in
Vision-Language
Models
🎨
Generative AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models
Do
Vision-Language
Models
See or Guess? Measuring and Reducing Textual-Prior Reliance with a Phrasing-Controlled Benchmark
🧠
LLM
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Do Vision-Language Models See or Guess? Measuring and Reducing Textual-Prior Reliance with a Phrasing-Controlled Benchmark
Do VLMs See What Sensors Feel? A Scalable Expert-Guided Design for Wheelchair Accessibility Assessment from Street View
🎨
Generative AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Do VLMs See What Sensors Feel? A Scalable Expert-Guided Design for Wheelchair Accessibility Assessment from Street View
Textual
Supervision Enhances Geospatial Representations in
Vision-Language
Models
🧠
LLM
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Textual Supervision Enhances Geospatial Representations in Vision-Language Models
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help