Multimodal AI

Feeds to Scour
SubscribedAll
Scoured 190 posts in 7.9 ms

An Effective Router for Vision-Language Model Selection

 🧠LLM  Content type: Academic
arxiv.org·

NVlabs/Eagle: Eagle: Frontier Vision-Language Models with Data-Centric Strategies

 🎨Generative AI  Content type: Code
github.com·

Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA

 🤖Large Language Models  Content type: Academic
arxiv.org·

MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models

 🔐Cryptography  Content type: Academic
arxiv.org·

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

 🧠LLM  Content type: Academic
arxiv.org·

Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models

 🧠LLM  Content type: Academic
arxiv.org·

Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?

 🎭Anthropic Claude  Content type: Academic
arxiv.org·

Explicit Representation Alignment for Multimodal Sentiment Analysis

 🤗Hugging Face  Content type: Academic
arxiv.org·

Stateful Visual Encoders for Vision-Language Models

 🎯AI Agents  Content type: Academic
arxiv.org·

Textual Supervision Enhances Geospatial Representations in Vision-Language Models

 🧠LLM  Content type: Academic
arxiv.org·

Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation

 🎨Generative AI  Content type: Academic
arxiv.org·

LEVANTE-bench: Multi-Scale Comparison of VLMs to Children Using Cognitive Tasks (or, "Is Your VLM Smarter Than a 5th Grader?")

 🤖LLM Inference  Content type: Academic
arxiv.org·

The Last Visible Pixel: Probing Fine-Scale Perception in Vision-Language Models

 🧠LLM  Content type: Academic
arxiv.org·

UniCanvas: A Diffusion-base Unified Model for Text-in-Image Joint Generation

 🎨Generative AI  Content type: Academic
arxiv.org·

One Stone, Three Birds: Self-adaptive Optimal Transport for Multi-VLM Selection, Adaptation, and Ensembling

 🤖LLM Inference  Content type: Academic
arxiv.org·

Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data

 🤖LLM Inference  Content type: Academic
arxiv.org·

Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models

 🎨Generative AI  Content type: Academic
arxiv.org·

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

 🎭Anthropic Claude  Content type: Academic
arxiv.org·

MotionEnhancer: Leveraging Video Diffusion for Motion-Enhanced Vision-Language Models

 🎨Generative AI  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help