Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Multimodal AI
🎭 Multimodal AI
Specific
multimodal, vision-language, audio-visual, cross-modal
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
452
posts in
7.2
ms
OpenCV 5.0 Released With Rewritten DNN Engine, Built-In LLM &
VLM
Support
👁️
VLMs
phoronix.com
·
5d
5 days ago
·
Hacker News
Actions for OpenCV 5.0 Released With Rewritten DNN Engine, Built-In LLM & VLM Support
Task-Aligned Stability Analysis of
Vision-Language
Models
for Autonomous Driving Hazard Detection
👁️
VLMs
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for Task-Aligned Stability Analysis of Vision-Language Models for Autonomous Driving Hazard Detection
Less-relevant results
fix(microsoft-foundry): filter unsupported Anthropic deployments · openclaw/openclaw@1240de7
🔧
Tool Use
Content type:
Code
github.com
·
2d
2 days ago
Actions for fix(microsoft-foundry): filter unsupported Anthropic deployments · openclaw/openclaw@1240de7
4DP-QA: Scalable QA for 4D Perception in
Vision
Language
Models
👁️
VLMs
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for 4DP-QA: Scalable QA for 4D Perception in Vision Language Models
VL-DINO: Leveraging
CLIP
Vision-Language
Knowledge for Open-Vocabulary Object Detectio
👁️
VLMs
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for VL-DINO: Leveraging CLIP Vision-Language Knowledge for Open-Vocabulary Object Detectio
linzhiqiu/t2v_metrics: Evaluating text-to-image/video/3D
models
with VQAScore
🔓
Open-source Models
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for linzhiqiu/t2v_metrics: Evaluating text-to-image/video/3D models with VQAScore
APT: Action Expert Pretraining Improves Instruction Generalization of
Vision-Language-Action
Policies
🤖
Embodied AI
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for APT: Action Expert Pretraining Improves Instruction Generalization of Vision-Language-Action Policies
AVIS: Adaptive Test-Time Scaling for
Vision-Language
Models
🖥️
Inference Compute
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for AVIS: Adaptive Test-Time Scaling for Vision-Language Models
Reroute, Don't Remove: Recoverable
Visual
Token Routing for
Vision-Language
Models
👁️
VLMs
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models
When to Align, When to Predict: A Phase Diagram for
Multimodal
Learning
👁️
VLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for When to Align, When to Predict: A Phase Diagram for Multimodal Learning
GoodQ02/goodq4all: Local-first
multimodal
epistemic memory for scene-level video,
audio
, and text intelligence.
⚡
Quantization
Content type:
Code
github.com
·
3d
3 days ago
·
Hacker News
Actions for GoodQ02/goodq4all: Local-first multimodal epistemic memory for scene-level video, audio, and text intelligence.
MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual
Vision-Language
Models
👁️
VLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models
From Prompts to Tokens: Internalizing Causal Supervision in
Vision-Language
Model
for Multi-Image Causal Reasoning
👁️
VLMs
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for From Prompts to Tokens: Internalizing Causal Supervision in Vision-Language Model for Multi-Image Causal Reasoning
Are Reasoning
Vision-Language
Models
Robust to Semantic Visual Distractions?
👁️
VLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?
fix: preserve Foundry Responses reasoning replay ids · openclaw/openclaw@248dfb2
🔧
Tool Use
Content type:
Code
github.com
·
4d
4 days ago
Actions for fix: preserve Foundry Responses reasoning replay ids · openclaw/openclaw@248dfb2
World
Model
Self-Distillation: Training World Models to Solve General Tasks
👁️
VLMs
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for World Model Self-Distillation: Training World Models to Solve General Tasks
One Stone, Three Birds: Self-adaptive Optimal Transport for
Multi-VLM
Selection, Adaptation, and Ensembling
👁️
VLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for One Stone, Three Birds: Self-adaptive Optimal Transport for Multi-VLM Selection, Adaptation, and Ensembling
OpenMedReason: Scientific Reasoning Supervision for Medical
Vision-Language
Models
💡
AI Reasoning
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models
OmniGameArena: A Unified UE5 Benchmark for
VLM
Game Agents with Improvement Dynamics
🔓
Open-source Models
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics
Benchmarking Large
Language
Models
for Safety Data Extraction
💡
AI Reasoning
Content type:
Academic
arxiv.org
·
10h
10 hours ago
Actions for Benchmarking Large Language Models for Safety Data Extraction
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help