Multimodal AI

Feeds to Scour
SubscribedAll
Scoured 455 posts in 7.3 ms

MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios

 🎨Generative AI  Content type: Academic
arxiv.org·

Anchored, Not Graded: Vision-Language Models Fail at Slant-from-Texture Perception

 🧠LLM  Content type: Academic
arxiv.org·

A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks

 🤖Large Language Models  Content type: Academic
arxiv.org·

DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation

 💬Natural Language Processing  Content type: Academic
arxiv.org·

UltraVR: A Diagnostic Ultra-Resolution Image-VQA Benchmark for Evidence-Grounded Reasoning

 🤖Large Language Models  Content type: Academic
arxiv.org·

Readable Yet Unpredictable: Rotated-Outcome Prediction in Vision-Language Models

 🧠LLM  Content type: Academic
arxiv.org·

DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models

 🧠LLM  Content type: Academic
arxiv.org·

ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies

 💬Prompt Engineering  Content type: Academic
arxiv.org·

Geometric Coastline Localization using Vision-Language Models

 🎨Generative AI  Content type: Academic
arxiv.org·

SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models

 💬Prompt Engineering  Content type: Academic
arxiv.org·

DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving

 🎨Generative AI  Content type: Academic
arxiv.org·

ChinaHeritaQA: A Culturally-Grounded Visual Question Answering Dataset for World Heritage Sites in China

 🧠LLMs  Content type: Academic
arxiv.org·

A Conversational Framework for Human-Robot Collaborative Manipulation with Distributed Generative AI models

 🎨Generative AI  Content type: Academic
arxiv.org·

TABVERSE: Benchmarking Cross-Format Table Understanding in LLMs and VLMs

 🧠LLM  Content type: Academic
arxiv.org·

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

 💬Prompt Engineering  Content type: Academic
arxiv.org·

One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA

 💬Prompt Engineering  Content type: Academic
arxiv.org·

CheXanatomy: Anatomy-Aware Vision-Language Modeling for Chest Radiographs

 🎨Generative AI  Content type: Academic
arxiv.org·

AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding

 🎨Generative AI  Content type: Academic
arxiv.org·

UNIVID: Unified Vision-Language Model for Video Moderation

 🎨Generative AI  Content type: Academic
arxiv.org·

Would you still call this Dax? Novel Visual References in VLMs and Humans

 🎨Generative AI  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help