Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Multimodal AI
🎭 Multimodal AI
Specific
multimodal, vision-language, audio-visual, cross-modal
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
457
posts in
4.3
ms
Benchmarking Large
Language
Models
for Safety Data Extraction
💡
AI Reasoning
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Benchmarking Large Language Models for Safety Data Extraction
Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data
👁️
VLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data
MultiToP: Learning to Patch
Visual
Tokens to Mitigate Hallucinations in Video Large
Multimodal
Models
👁️
VLMs
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for MultiToP: Learning to Patch Visual Tokens to Mitigate Hallucinations in Video Large Multimodal Models
Explicit Representation Alignment for
Multimodal
Sentiment Analysis
👁️
VLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Explicit Representation Alignment for Multimodal Sentiment Analysis
FreqKD: Frequency-Decoupled
Cross-Modal
Knowledge Distillation for Infrared Object Detection
👁️
VLMs
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for FreqKD: Frequency-Decoupled Cross-Modal Knowledge Distillation for Infrared Object Detection
Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
👁️
VLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
Traits Run Deeper: Trait-Specific Asymmetric Fusion for Personality Assessment
🔓
Open-source Models
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Traits Run Deeper: Trait-Specific Asymmetric Fusion for Personality Assessment
Decoding Pedestrian
Crossing
Intention from Egocentric
Vision
via
Vision
Language
Models
👁️
VLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models
Textual Supervision Enhances Geospatial Representations in
Vision-Language
Models
👁️
VLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Textual Supervision Enhances Geospatial Representations in Vision-Language Models
One Token per
Multimodal
Evidence:
Latent
Memory for Resource-Constrained QA
👁️
VLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA
MSUE:
Multi-Modal
Soccer Understanding Expert
👁️
VLMs
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for MSUE: Multi-Modal Soccer Understanding Expert
MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios
👁️
VLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios
Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in
Vision-Language
Models
👁️
VLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models
DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?
🖥️
Inference Compute
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?
Vision
Language
Model
Helps Private Information De-Identification in
Vision
Data
👁️
VLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Vision Language Model Helps Private Information De-Identification in Vision Data
Almieyar-Oryx-BloomBench: A Bilingual
Multimodal
Benchmark for Cognitively Informed Evaluation of
Vision-Language
Models
👁️
VLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models
AgenticNav: Zero-Shot
Vision-and-Language
Navigation as a Tool-Calling Harness
👁️
VLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for AgenticNav: Zero-Shot Vision-and-Language Navigation as a Tool-Calling Harness
Cross-Modal
Benchmarking for Robotic Perception in Natural Environments
👁️
VLMs
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Cross-Modal Benchmarking for Robotic Perception in Natural Environments
Harnessing Streaming Video in the Wild
👁️
VLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Harnessing Streaming Video in the Wild
LEVANTE-bench:
Multi-Scale
Comparison of VLMs to Children Using Cognitive Tasks (or, "Is Your
VLM
Smarter Than a 5th Grader?")
👁️
VLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for LEVANTE-bench: Multi-Scale Comparison of VLMs to Children Using Cognitive Tasks (or, "Is Your VLM Smarter Than a 5th Grader?")
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help