Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Multimodal AI
🔮 Multimodal AI
multimodal, vision-language, audio-visual, foundation models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
776
posts in
16.4
ms
An Effective Router for
Vision-Language
Model
Selection
🤖
AI Engineering
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for An Effective Router for Vision-Language Model Selection
A generalist biomedical
vision-language
model
via multi-CLIP knowledge distillation
🧠
LLM Research
Content type:
Academic
nature.com
·
9h
9 hours ago
Actions for A generalist biomedical vision-language model via multi-CLIP knowledge distillation
NVlabs/Eagle: Eagle: Frontier
Vision-Language
Models
with Data-Centric Strategies
👁️
Computer Vision
Content type:
Code
github.com
·
4d
4 days ago
Actions for NVlabs/Eagle: Eagle: Frontier Vision-Language Models with Data-Centric Strategies
Siri's biggest upgrade in years comes with help from
Gemini
🎙️
Speech AI
Content type:
News
androidcentral.com
·
1d
1 day ago
Actions for Siri's biggest upgrade in years comes with help from Gemini
How Will the
Multimodal
AI
Market Grow Through 2034 Amid Emerging Trends and Business Strategies?
🤖
Robotics
Content type:
Blog
semiconinsights.wordpress.com
·
4d
4 days ago
Actions for How Will the Multimodal AI Market Grow Through 2034 Amid Emerging Trends and Business Strategies?
Gemini
lied to me about my hobby, and that showed me what its real problem is
🛡️
AI Safety
androidpolice.com
·
2d
2 days ago
Actions for Gemini lied to me about my hobby, and that showed me what its real problem is
Apple Reveals New
AI
Architecture Built Around Google
Gemini
Models
🤖
AI Engineering
Content type:
News
macrumors.com
·
1d
1 day ago
·
Hacker News
Actions for Apple Reveals New AI Architecture Built Around Google Gemini Models
Introducing Gemma 4 12B: a unified, encoder-free
multimodal
model
🤖
AI Engineering
Content type:
Blog
blog.google
·
6d
6 days ago
·
DEV
,
Hacker News
,
r/LocalLLaMA
Actions for Introducing Gemma 4 12B: a unified, encoder-free multimodal model
Nano Banana Pro (
Gemini
3 Pro Image): Developer Guide & API 2026
👁️
Computer Vision
Content type:
Blog
wowhow.cloud
·
5d
5 days ago
·
DEV
Actions for Nano Banana Pro (Gemini 3 Pro Image): Developer Guide & API 2026
OpenCV 5.0 Released With Rewritten DNN Engine, Built-In LLM &
VLM
Support
👁️
Computer Vision
phoronix.com
·
3d
3 days ago
·
Hacker News
Actions for OpenCV 5.0 Released With Rewritten DNN Engine, Built-In LLM & VLM Support
Do
Vision-Language
Models
See or Guess? Measuring and Reducing Textual-Prior Reliance with a Phrasing-Controlled Benchmark
🧠
LLM Research
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for Do Vision-Language Models See or Guess? Measuring and Reducing Textual-Prior Reliance with a Phrasing-Controlled Benchmark
linzhiqiu/t2v_metrics: Evaluating text-to-image/video/3D
models
with VQAScore
🎮
GPU Programming
Content type:
Code
github.com
·
16h
16 hours ago
·
Hacker News
Actions for linzhiqiu/t2v_metrics: Evaluating text-to-image/video/3D models with VQAScore
A Controlled
Audit
of Pretraining Contamination in Public Medical
Vision-Language
Benchmarks
🧠
LLM Research
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks
I've been using
Gemini
all wrong, and I only realized it when I stopped typing
🗄️
Database Internals
androidpolice.com
·
3d
3 days ago
Actions for I've been using Gemini all wrong, and I only realized it when I stopped typing
MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual
Vision-Language
Models
🎙️
Speech AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models
DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation
Geometric Coastline Localization using
Vision-Language
Models
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for Geometric Coastline Localization using Vision-Language Models
I uploaded hundreds of forgotten screenshots into
Gemini
, and the results freaked me out
🗄️
Database Internals
androidpolice.com
·
3d
3 days ago
Actions for I uploaded hundreds of forgotten screenshots into Gemini, and the results freaked me out
Are Reasoning
Vision-Language
Models
Robust to Semantic Visual Distractions?
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?
Multimodal
Brain Tumour Classification Using Feature Fusion
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for Multimodal Brain Tumour Classification Using Feature Fusion
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help