Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Multimodal AI
🔮 Multimodal AI
multimodal, vision-language, audio-visual, foundation models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
796
posts in
3.9
ms
3D-CoS: A New 3D Reconstruction Paradigm Based on
VLM
Code Synthesis
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
14h
14 hours ago
Actions for 3D-CoS: A New 3D Reconstruction Paradigm Based on VLM Code Synthesis
Decoding Pedestrian
Crossing
Intention from Egocentric
Vision
via
Vision
Language
Models
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models
Beyond Symmetric Alignment: Spectral Diagnostics of
Modality
Imbalance in
Vision-Language
Models in the Medical Domain
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Beyond Symmetric Alignment: Spectral Diagnostics of Modality Imbalance in Vision-Language Models in the Medical Domain
Modeling
Complex Behaviors:
Multi-Personality
Composition and Dynamic Switching in
Vision-Language
Models
🧠
LLM Research
Content type:
Academic
arxiv.org
·
14h
14 hours ago
Actions for Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models
AnyMod-LLVE: Low-Light Video Enhancement with
Modality-Agnostic
Inference
🧠
LLM Research
Content type:
Academic
arxiv.org
·
14h
14 hours ago
Actions for AnyMod-LLVE: Low-Light Video Enhancement with Modality-Agnostic Inference
CLASP:
Language-Driven
Robot Skill Selection and Composition using Task-Parameterized Learning
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning
NextMotionQA: Benchmarking and Judging Human Motion Understanding with
Vision-Language
Models
🤖
Robotics
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for NextMotionQA: Benchmarking and Judging Human Motion Understanding with Vision-Language Models
Harnessing Streaming Video in the Wild
🤖
AI Engineering
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Harnessing Streaming Video in the Wild
From Senses to Decisions: The Information Flow of
Auditory
and
Visual
Perception in
Multimodal
LLMs
🧠
LLM Research
Content type:
Academic
arxiv.org
·
14h
14 hours ago
Actions for From Senses to Decisions: The Information Flow of Auditory and Visual Perception in Multimodal LLMs
A Dataset for Dynamic Human Preferences for
Vision
Language
Models
🧠
LLM Research
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Dataset for Dynamic Human Preferences for Vision Language Models
Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for
Vision-Language
Models
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models
Do VLMs See What Sensors Feel? A Scalable Expert-Guided Design for Wheelchair Accessibility Assessment from Street View
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Do VLMs See What Sensors Feel? A Scalable Expert-Guided Design for Wheelchair Accessibility Assessment from Street View
UniCanvas: A
Diffusion-base
Unified
Model
for Text-in-Image Joint Generation
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for UniCanvas: A Diffusion-base Unified Model for Text-in-Image Joint Generation
MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios
🤖
AI Engineering
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios
Anchored, Not Graded:
Vision-Language
Models
Fail at Slant-from-Texture Perception
🧠
LLM Research
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Anchored, Not Graded: Vision-Language Models Fail at Slant-from-Texture Perception
EasyLens: A Training-Free Plug-and-Play Subtle-Lesion Representation Amplifier for Medical
Vision-Language
Models
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for EasyLens: A Training-Free Plug-and-Play Subtle-Lesion Representation Amplifier for Medical Vision-Language Models
DyCo-RL: Dynamic
Cross-Modal
Coordination for
Visual
Reasoning
🎯
Reinforcement Learning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for DyCo-RL: Dynamic Cross-Modal Coordination for Visual Reasoning
Query-based
Cross-Modal
Projector Bolstering Mamba
Multimodal
LLM
🧠
LLM Research
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Query-based Cross-Modal Projector Bolstering Mamba Multimodal LLM
ChinaHeritaQA: A Culturally-Grounded
Visual
Question Answering Dataset for World Heritage Sites in China
🧠
LLM Research
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for ChinaHeritaQA: A Culturally-Grounded Visual Question Answering Dataset for World Heritage Sites in China
Stream3D-VLM
: Online 3D Spatial Understanding with Incremental Geometry Priors
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help