Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Multimodal AI
🖼️ Multimodal AI
vision-language models, VLM, image-text, multimodal learning
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
306
posts in
5.6
ms
World
Model
Self-Distillation: Training World Models to Solve General Tasks
🛰️
Geospatial AI
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for World Model Self-Distillation: Training World Models to Solve General Tasks
One Stone, Three Birds: Self-adaptive Optimal Transport for
Multi-VLM
Selection, Adaptation, and Ensembling
🛰️
Geospatial AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for One Stone, Three Birds: Self-adaptive Optimal Transport for Multi-VLM Selection, Adaptation, and Ensembling
GP-Adapter: Gaussian Process
CLIP-Adapter
for
Few-Shot
Out-of-Distribution Detection
📊
OOD Detection
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for GP-Adapter: Gaussian Process CLIP-Adapter for Few-Shot Out-of-Distribution Detection
Latent World Recovery for
Multimodal
Learning
with Missing Modalities
🏷️
Label Noise
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Latent World Recovery for Multimodal Learning with Missing Modalities
Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data
📊
OOD Detection
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data
Information-Theoretic Decomposition for
Multimodal
Interaction
Learning
🔍
Fine-Grained Classification
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Information-Theoretic Decomposition for Multimodal Interaction Learning
MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual
Vision-Language
Models
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models
Diagnosing
Visual
Ignorance in
Vision-Language
Models
🏷️
Label Noise
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Diagnosing Visual Ignorance in Vision-Language Models
Metadata-Aware
Multi-Prompt
Reasoning for
Zero-Shot
Accident Understanding
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Metadata-Aware Multi-Prompt Reasoning for Zero-Shot Accident Understanding
CLASP:
Language-Driven
Robot Skill Selection and Composition using Task-Parameterized
Learning
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning
UltraVR: A Diagnostic Ultra-Resolution
Image-VQA
Benchmark for Evidence-Grounded Reasoning
🛰️
Geospatial AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for UltraVR: A Diagnostic Ultra-Resolution Image-VQA Benchmark for Evidence-Grounded Reasoning
Parameter-Efficient Adapter Tuning for
Tabular-Image
Multimodal
Learning
🔍
Fine-Grained Classification
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Parameter-Efficient Adapter Tuning for Tabular-Image Multimodal Learning
The Last Visible Pixel: Probing Fine-Scale Perception in
Vision-Language
Models
🔍
Fine-Grained Classification
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for The Last Visible Pixel: Probing Fine-Scale Perception in Vision-Language Models
SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust
Vision-Language
Models
⚙️
MLOps
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models
OmniGameArena: A Unified UE5 Benchmark for
VLM
Game Agents with Improvement Dynamics
🤖
AI Agents
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics
Embodied-R1.5: Evolving Physical Intelligence via Embodied
Foundation
Models
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models
Vision
Language
Model
Helps Private Information De-Identification in
Vision
Data
🛰️
Geospatial AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Vision Language Model Helps Private Information De-Identification in Vision Data
DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?
CheXanatomy: Anatomy-Aware
Vision-Language
Modeling
for Chest Radiographs
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for CheXanatomy: Anatomy-Aware Vision-Language Modeling for Chest Radiographs
Attention Consistent Longitudinal Medical
Visual
Question
Answering
Guided by Vision Foundation Models
🔍
Fine-Grained Classification
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Attention Consistent Longitudinal Medical Visual Question Answering Guided by Vision Foundation Models
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help