Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Multimodal AI
👁️ Multimodal AI
Vision-Language, Image-Text, Multimodal Models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
455
posts in
7.3
ms
MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios
🎨
Generative AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios
Anchored, Not Graded:
Vision-Language
Models
Fail at Slant-from-Texture Perception
🧠
LLM
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Anchored, Not Graded: Vision-Language Models Fail at Slant-from-Texture Perception
A Controlled Audit of Pretraining Contamination in Public Medical
Vision-Language
Benchmarks
🤖
Large Language Models
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks
DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation
💬
Natural Language Processing
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation
UltraVR: A Diagnostic Ultra-Resolution
Image-VQA
Benchmark for Evidence-Grounded Reasoning
🤖
Large Language Models
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for UltraVR: A Diagnostic Ultra-Resolution Image-VQA Benchmark for Evidence-Grounded Reasoning
Readable Yet Unpredictable: Rotated-Outcome Prediction in
Vision-Language
Models
🧠
LLM
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Readable Yet Unpredictable: Rotated-Outcome Prediction in Vision-Language Models
DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in
Vision-Language
Models
🧠
LLM
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models
ReCoVLA:
VLM-Guided
Reward Compilation for Failure Recovery in
Vision-Language-Action
Policies
💬
Prompt Engineering
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies
Geometric Coastline Localization using
Vision-Language
Models
🎨
Generative AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Geometric Coastline Localization using Vision-Language Models
SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust
Vision-Language
Models
💬
Prompt Engineering
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models
DriveReward: A Comprehensive Dataset and Generative
Vision-Language
Reward
Model
for Autonomous Driving
🎨
Generative AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving
ChinaHeritaQA: A Culturally-Grounded
Visual
Question
Answering
Dataset for World Heritage Sites in China
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for ChinaHeritaQA: A Culturally-Grounded Visual Question Answering Dataset for World Heritage Sites in China
A Conversational Framework for Human-Robot Collaborative Manipulation with Distributed Generative
AI
models
🎨
Generative AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for A Conversational Framework for Human-Robot Collaborative Manipulation with Distributed Generative AI models
TABVERSE: Benchmarking Cross-Format Table Understanding in LLMs and VLMs
🧠
LLM
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for TABVERSE: Benchmarking Cross-Format Table Understanding in LLMs and VLMs
Thinking with
Imagination
: Agentic
Visual
Spatial Reasoning with World Simulators
💬
Prompt Engineering
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators
One Token per
Multimodal
Evidence: Latent Memory for Resource-Constrained QA
💬
Prompt Engineering
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA
CheXanatomy: Anatomy-Aware
Vision-Language
Modeling
for Chest Radiographs
🎨
Generative AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for CheXanatomy: Anatomy-Aware Vision-Language Modeling for Chest Radiographs
AffordanceVLA: A
Vision-Language-Action
Model
Empowering Action Generation through Affordance-Aware Understanding
🎨
Generative AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding
UNIVID: Unified
Vision-Language
Model
for Video Moderation
🎨
Generative AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for UNIVID: Unified Vision-Language Model for Video Moderation
Would you still call this Dax? Novel
Visual
References in VLMs and Humans
🎨
Generative AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Would you still call this Dax? Novel Visual References in VLMs and Humans
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help