Multimodal AI

Feeds to Scour
SubscribedAll
Scoured 789 posts in 7.4 ms

A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks

 🧠LLM Research  Content type: Academic
arxiv.org·

Explicit Representation Alignment for Multimodal Sentiment Analysis

 👁️Computer Vision  Content type: Academic
arxiv.org·

DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation

 🛡️AI Safety  Content type: Academic
arxiv.org·

Diagnosing Visual Ignorance in Vision-Language Models

 👁️Computer Vision  Content type: Academic
arxiv.org·

Geometric Coastline Localization using Vision-Language Models

 👁️Computer Vision  Content type: Academic
arxiv.org·

One Stone, Three Birds: Self-adaptive Optimal Transport for Multi-VLM Selection, Adaptation, and Ensembling

 🛡️AI Safety  Content type: Academic
arxiv.org·

Multimodal Brain Tumour Classification Using Feature Fusion

 👁️Computer Vision  Content type: Academic
arxiv.org·

Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data

 🧠LLM Research  Content type: Academic
arxiv.org·

The Last Visible Pixel: Probing Fine-Scale Perception in Vision-Language Models

 👁️Computer Vision  Content type: Academic
arxiv.org·

MMClima: A Framework for Multimodal Climate Science Data and Evaluation

 🧠LLM Research  Content type: Academic
arxiv.org·

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

 🧠LLM Research  Content type: Academic
arxiv.org·

MotionEnhancer: Leveraging Video Diffusion for Motion-Enhanced Vision-Language Models

 👁️Computer Vision  Content type: Academic
arxiv.org·

M$^3$Exam: Benchmarking Multimodal Memory for Realistic User-Agent Interactions

 🤖AI Engineering  Content type: Academic
arxiv.org·

Earth-OneVision: Extending Remote Sensing Multimodal Large Language Models to More Sensor Modalities and Tasks

 🧠LLM Research  Content type: Academic
arxiv.org·

Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models

 👁️Computer Vision  Content type: Academic
arxiv.org·

Vision Language Model Helps Private Information De-Identification in Vision Data

 👁️Computer Vision  Content type: Academic
arxiv.org·

3D-CoS: A New 3D Reconstruction Paradigm Based on VLM Code Synthesis

 👁️Computer Vision  Content type: Academic
arxiv.org·

Stateful Visual Encoders for Vision-Language Models

 👁️Computer Vision  Content type: Academic
arxiv.org·

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

 🎯Reinforcement Learning  Content type: Academic
arxiv.org·

Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models

 🧠LLM Research  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help