Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
VLMs
👁️ VLMs
Specific
vision language models, visual LLM, multimodal model
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
258
posts in
4.7
ms
Personal AI Agent for Camera Roll
VQA
🕵️
AI Agents
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Personal AI Agent for Camera Roll VQA
Less-relevant results
World
Model
Self-Distillation: Training World Models to Solve General Tasks
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for World Model Self-Distillation: Training World Models to Solve General Tasks
Readable Yet Unpredictable: Rotated-Outcome Prediction in
Vision-Language
Models
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Readable Yet Unpredictable: Rotated-Outcome Prediction in Vision-Language Models
ChinaHeritaQA: A Culturally-Grounded
Visual
Question
Answering
Dataset for World Heritage Sites in China
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for ChinaHeritaQA: A Culturally-Grounded Visual Question Answering Dataset for World Heritage Sites in China
GP-Adapter: Gaussian Process
CLIP-Adapter
for Few-Shot Out-of-Distribution Detection
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for GP-Adapter: Gaussian Process CLIP-Adapter for Few-Shot Out-of-Distribution Detection
Are Reasoning
Vision-Language
Models
Robust to Semantic Visual Distractions?
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?
Adversarial Attacks Already Tell the
Answer
: Directional Bias-Guided Test-time Defense for
Vision-Language
Models
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models
Aligned but Not Partner-Specific: Distinguishing How
Multimodal
LLM
Agents Succeed in Reference Games Without Human-Like Conventions
🔧
Tool Use
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Aligned but Not Partner-Specific: Distinguishing How Multimodal LLM Agents Succeed in Reference Games Without Human-Like Conventions
SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust
Vision-Language
Models
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models
VQA
for Dynamic Portfolio Optimization: Sampling Strategies, Optimizer Scheduling, and Hardware-Aware Ansatz Design
⚡
Quantization
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for VQA for Dynamic Portfolio Optimization: Sampling Strategies, Optimizer Scheduling, and Hardware-Aware Ansatz Design
APT: Action Expert Pretraining Improves Instruction Generalization of
Vision-Language-Action
Policies
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for APT: Action Expert Pretraining Improves Instruction Generalization of Vision-Language-Action Policies
Do
Vision-Language
Models
See or Guess? Measuring and Reducing Textual-Prior Reliance with a Phrasing-Controlled Benchmark
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Do Vision-Language Models See or Guess? Measuring and Reducing Textual-Prior Reliance with a Phrasing-Controlled Benchmark
TEVI: Text-Conditioned Editing of
Visual
Representations via Sparse Autoencoders for Improved
Vision-Language
Alignment
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment
ProcessThinker: Enhancing
Multi-modal
Large
Language
Models
Reasoning via Rollout-based Process Reward
💡
AI Reasoning
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward
FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified
vision-language
model
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model
When
CLIP
Sees More, It Fights Back Harder:
Multi-View
Guided Adaptive Counterattacks for Test-Time Adversarial Robustness
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for When CLIP Sees More, It Fights Back Harder: Multi-View Guided Adaptive Counterattacks for Test-Time Adversarial Robustness
Steer Where It Matters: Token-Level
Visual-Sensitivity
Steering for LVLMs Hallucination Mitigation
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Steer Where It Matters: Token-Level Visual-Sensitivity Steering for LVLMs Hallucination Mitigation
DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?
🖥️
Inference Compute
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?
SD-GRPO: Verifiable Segment Decomposition for Long-Form
Vision-Language
Generation
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for SD-GRPO: Verifiable Segment Decomposition for Long-Form Vision-Language Generation
UNIVID: Unified
Vision-Language
Model
for Video Moderation
🎭
Multimodal AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for UNIVID: Unified Vision-Language Model for Video Moderation
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help