Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Multimodal AI
🔮 Multimodal AI
multimodal, vision-language, audio-visual, foundation models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
789
posts in
7.4
ms
A Controlled
Audit
of Pretraining Contamination in Public Medical
Vision-Language
Benchmarks
🧠
LLM Research
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks
Explicit Representation Alignment for
Multimodal
Sentiment Analysis
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Explicit Representation Alignment for Multimodal Sentiment Analysis
DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation
Diagnosing
Visual
Ignorance in
Vision-Language
Models
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Diagnosing Visual Ignorance in Vision-Language Models
Geometric Coastline Localization using
Vision-Language
Models
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Geometric Coastline Localization using Vision-Language Models
One Stone, Three Birds: Self-adaptive Optimal Transport for
Multi-VLM
Selection, Adaptation, and Ensembling
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for One Stone, Three Birds: Self-adaptive Optimal Transport for Multi-VLM Selection, Adaptation, and Ensembling
Multimodal
Brain Tumour Classification Using Feature Fusion
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Multimodal Brain Tumour Classification Using Feature Fusion
Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data
🧠
LLM Research
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data
The Last Visible Pixel: Probing Fine-Scale Perception in
Vision-Language
Models
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Last Visible Pixel: Probing Fine-Scale Perception in Vision-Language Models
MMClima: A Framework for
Multimodal
Climate
Science Data and Evaluation
🧠
LLM Research
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for MMClima: A Framework for Multimodal Climate Science Data and Evaluation
Almieyar-Oryx-BloomBench: A Bilingual
Multimodal
Benchmark for Cognitively Informed Evaluation of
Vision-Language
Models
🧠
LLM Research
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models
MotionEnhancer: Leveraging Video
Diffusion
for Motion-Enhanced
Vision-Language
Models
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for MotionEnhancer: Leveraging Video Diffusion for Motion-Enhanced Vision-Language Models
M$^3$Exam: Benchmarking
Multimodal
Memory for Realistic User-Agent Interactions
🤖
AI Engineering
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for M$^3$Exam: Benchmarking Multimodal Memory for Realistic User-Agent Interactions
Earth-OneVision: Extending Remote Sensing
Multimodal
Large
Language
Models
to More Sensor Modalities and Tasks
🧠
LLM Research
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Earth-OneVision: Extending Remote Sensing Multimodal Large Language Models to More Sensor Modalities and Tasks
Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in
Vision-Language
Models
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models
Vision
Language
Model
Helps Private Information De-Identification in
Vision
Data
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Vision Language Model Helps Private Information De-Identification in Vision Data
3D-CoS: A New 3D Reconstruction Paradigm Based on
VLM
Code Synthesis
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for 3D-CoS: A New 3D Reconstruction Paradigm Based on VLM Code Synthesis
Stateful
Visual
Encoders for
Vision-Language
Models
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Stateful Visual Encoders for Vision-Language Models
OmniGameArena: A Unified UE5 Benchmark for
VLM
Game Agents with Improvement Dynamics
🎯
Reinforcement Learning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics
Modeling
Complex Behaviors:
Multi-Personality
Composition and Dynamic Switching in
Vision-Language
Models
🧠
LLM Research
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help