Multimodal AI

Feeds to Scour
SubscribedAll
Scoured 452 posts in 7.2 ms

OpenCV 5.0 Released With Rewritten DNN Engine, Built-In LLM & VLM Support

 👁️VLMs
phoronix.com··Hacker News

Task-Aligned Stability Analysis of Vision-Language Models for Autonomous Driving Hazard Detection

 👁️VLMs  Content type: Academic
arxiv.org·
Less-relevant results

fix(microsoft-foundry): filter unsupported Anthropic deployments · openclaw/openclaw@1240de7

 🔧Tool Use  Content type: Code
github.com·

4DP-QA: Scalable QA for 4D Perception in Vision Language Models

 👁️VLMs  Content type: Academic
arxiv.org·

VL-DINO: Leveraging CLIP Vision-Language Knowledge for Open-Vocabulary Object Detectio

 👁️VLMs  Content type: Academic
arxiv.org·

linzhiqiu/t2v_metrics: Evaluating text-to-image/video/3D models with VQAScore

 🔓Open-source Models  Content type: Code
github.com··Hacker News

APT: Action Expert Pretraining Improves Instruction Generalization of Vision-Language-Action Policies

 🤖Embodied AI  Content type: Academic
arxiv.org·

AVIS: Adaptive Test-Time Scaling for Vision-Language Models

 🖥️Inference Compute  Content type: Academic
arxiv.org·

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

 👁️VLMs  Content type: Academic
arxiv.org·

When to Align, When to Predict: A Phase Diagram for Multimodal Learning

 👁️VLMs  Content type: Academic
arxiv.org·

GoodQ02/goodq4all: Local-first multimodal epistemic memory for scene-level video, audio, and text intelligence.

 Quantization  Content type: Code
github.com··Hacker News

MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models

 👁️VLMs  Content type: Academic
arxiv.org·

From Prompts to Tokens: Internalizing Causal Supervision in Vision-Language Model for Multi-Image Causal Reasoning

 👁️VLMs  Content type: Academic
arxiv.org·

Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?

 👁️VLMs  Content type: Academic
arxiv.org·

fix: preserve Foundry Responses reasoning replay ids · openclaw/openclaw@248dfb2

 🔧Tool Use  Content type: Code
github.com·

World Model Self-Distillation: Training World Models to Solve General Tasks

 👁️VLMs  Content type: Academic
arxiv.org·

One Stone, Three Birds: Self-adaptive Optimal Transport for Multi-VLM Selection, Adaptation, and Ensembling

 👁️VLMs  Content type: Academic
arxiv.org·

OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models

 💡AI Reasoning  Content type: Academic
arxiv.org·

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

 🔓Open-source Models  Content type: Academic
arxiv.org·

Benchmarking Large Language Models for Safety Data Extraction

 💡AI Reasoning  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help