Multimodal AI

Feeds to Scour
SubscribedAll
Scoured 459 posts in 7.6 ms

VL-DINO: Leveraging CLIP Vision-Language Knowledge for Open-Vocabulary Object Detectio

 🎨Generative AI  Content type: Academic
arxiv.org·

From Prompts to Tokens: Internalizing Causal Supervision in Vision-Language Model for Multi-Image Causal Reasoning

 🎨Generative AI  Content type: Academic
arxiv.org·

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

 🤖Large Language Models  Content type: Academic
arxiv.org·

Diagnosing Visual Ignorance in Vision-Language Models

 🧠LLM  Content type: Academic
arxiv.org·

The Last Visible Pixel: Probing Fine-Scale Perception in Vision-Language Models

 🧠LLM  Content type: Academic
arxiv.org·

Vision Language Model Helps Private Information De-Identification in Vision Data

 🎨Generative AI  Content type: Academic
arxiv.org·

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

 🎨Generative AI  Content type: Academic
arxiv.org·

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

 🤖AI Agents  Content type: Academic
arxiv.org·

EasyLens: A Training-Free Plug-and-Play Subtle-Lesion Representation Amplifier for Medical Vision-Language Models

 🎨Generative AI  Content type: Academic
arxiv.org·

AgenticNav: Zero-Shot Vision-and-Language Navigation as a Tool-Calling Harness

 🎨Generative AI  Content type: Academic
arxiv.org·

Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation

 💬Prompt Engineering  Content type: Academic
arxiv.org·

PlanBench-V: A Spatial Planning Map Benchmark for Vision-Language Models

 🤖ChatGPT  Content type: Academic
arxiv.org·

CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning

 💬Prompt Engineering  Content type: Academic
arxiv.org·

Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models

 🎨Generative AI  Content type: Academic
arxiv.org·

A Dataset for Dynamic Human Preferences for Vision Language Models

 🧠LLM  Content type: Academic
arxiv.org·

Harnessing Streaming Video in the Wild

 🔍Feed Discovery  Content type: Academic
arxiv.org·

Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models

 🎨Generative AI  Content type: Academic
arxiv.org·

Do Vision-Language Models See or Guess? Measuring and Reducing Textual-Prior Reliance with a Phrasing-Controlled Benchmark

 🧠LLM  Content type: Academic
arxiv.org·

Do VLMs See What Sensors Feel? A Scalable Expert-Guided Design for Wheelchair Accessibility Assessment from Street View

 🎨Generative AI  Content type: Academic
arxiv.org·

Textual Supervision Enhances Geospatial Representations in Vision-Language Models

 🧠LLM  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help