Multimodal AI

Feeds to Scour
SubscribedAll
Scoured 306 posts in 5.6 ms

World Model Self-Distillation: Training World Models to Solve General Tasks

 🛰️Geospatial AI  Content type: Academic
arxiv.org·

One Stone, Three Birds: Self-adaptive Optimal Transport for Multi-VLM Selection, Adaptation, and Ensembling

 🛰️Geospatial AI  Content type: Academic
arxiv.org·

GP-Adapter: Gaussian Process CLIP-Adapter for Few-Shot Out-of-Distribution Detection

 📊OOD Detection  Content type: Academic
arxiv.org·

Latent World Recovery for Multimodal Learning with Missing Modalities

 🏷️Label Noise  Content type: Academic
arxiv.org·

Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data

 📊OOD Detection  Content type: Academic
arxiv.org·

Information-Theoretic Decomposition for Multimodal Interaction Learning

 🔍Fine-Grained Classification  Content type: Academic
arxiv.org·

MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models

 🧠LLMs  Content type: Academic
arxiv.org·

Diagnosing Visual Ignorance in Vision-Language Models

 🏷️Label Noise  Content type: Academic
arxiv.org·

Metadata-Aware Multi-Prompt Reasoning for Zero-Shot Accident Understanding

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

UltraVR: A Diagnostic Ultra-Resolution Image-VQA Benchmark for Evidence-Grounded Reasoning

 🛰️Geospatial AI  Content type: Academic
arxiv.org·

Parameter-Efficient Adapter Tuning for Tabular-Image Multimodal Learning

 🔍Fine-Grained Classification  Content type: Academic
arxiv.org·

The Last Visible Pixel: Probing Fine-Scale Perception in Vision-Language Models

 🔍Fine-Grained Classification  Content type: Academic
arxiv.org·

SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models

 ⚙️MLOps  Content type: Academic
arxiv.org·

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

 🤖AI Agents  Content type: Academic
arxiv.org·

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

Vision Language Model Helps Private Information De-Identification in Vision Data

 🛰️Geospatial AI  Content type: Academic
arxiv.org·

DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

CheXanatomy: Anatomy-Aware Vision-Language Modeling for Chest Radiographs

 👁️Computer Vision  Content type: Academic
arxiv.org·

Attention Consistent Longitudinal Medical Visual Question Answering Guided by Vision Foundation Models

 🔍Fine-Grained Classification  Content type: Academic
arxiv.org·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help