Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Multimodal AI
🔮 Multimodal AI
multimodal, vision-language, audio-visual, foundation models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
796
posts in
6.8
ms
An Effective Router for
Vision-Language
Model
Selection
🤖
AI Engineering
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for An Effective Router for Vision-Language Model Selection
A generalist biomedical
vision-language
model
via multi-CLIP knowledge distillation
🧠
LLM Research
Content type:
Academic
nature.com
·
18h
18 hours ago
Actions for A generalist biomedical vision-language model via multi-CLIP knowledge distillation
NVlabs/Eagle: Eagle: Frontier
Vision-Language
Models
with Data-Centric Strategies
👁️
Computer Vision
Content type:
Code
github.com
·
4d
4 days ago
Actions for NVlabs/Eagle: Eagle: Frontier Vision-Language Models with Data-Centric Strategies
What I Learned Building a
Multimodal
AI
Studio Solo on
Gemini
+ Veo
🎙️
Speech AI
Content type:
Discussion
geminiomni-ai.com
·
6h
6 hours ago
·
DEV
Actions for What I Learned Building a Multimodal AI Studio Solo on Gemini + Veo
Siri's biggest upgrade in years comes with help from
Gemini
🎙️
Speech AI
Content type:
News
androidcentral.com
·
1d
1 day ago
Actions for Siri's biggest upgrade in years comes with help from Gemini
How Will the
Multimodal
AI
Market Grow Through 2034 Amid Emerging Trends and Business Strategies?
🤖
Robotics
Content type:
Blog
semiconinsights.wordpress.com
·
5d
5 days ago
Actions for How Will the Multimodal AI Market Grow Through 2034 Amid Emerging Trends and Business Strategies?
Gemini
lied to me about my hobby, and that showed me what its real problem is
🛡️
AI Safety
androidpolice.com
·
2d
2 days ago
Actions for Gemini lied to me about my hobby, and that showed me what its real problem is
Nano Banana Pro (
Gemini
3 Pro Image): Developer Guide & API 2026
👁️
Computer Vision
Content type:
Blog
wowhow.cloud
·
5d
5 days ago
·
DEV
Actions for Nano Banana Pro (Gemini 3 Pro Image): Developer Guide & API 2026
When to Align, When to Predict: A Phase Diagram for
Multimodal
Learning
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
14h
14 hours ago
Actions for When to Align, When to Predict: A Phase Diagram for Multimodal Learning
linzhiqiu/t2v_metrics: Evaluating text-to-image/video/3D
models
with VQAScore
🎮
GPU Programming
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for linzhiqiu/t2v_metrics: Evaluating text-to-image/video/3D models with VQAScore
One Token per
Multimodal
Evidence: Latent Memory for Resource-Constrained QA
🧠
LLM Research
Content type:
Academic
arxiv.org
·
14h
14 hours ago
Actions for One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA
I've been using
Gemini
all wrong, and I only realized it when I stopped typing
🗄️
Database Internals
androidpolice.com
·
4d
4 days ago
Actions for I've been using Gemini all wrong, and I only realized it when I stopped typing
Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
🤖
AI Engineering
Content type:
Academic
arxiv.org
·
14h
14 hours ago
Actions for Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual
Vision-Language
Models
🎙️
Speech AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models
I uploaded hundreds of forgotten screenshots into
Gemini
, and the results freaked me out
🗄️
Database Internals
androidpolice.com
·
4d
4 days ago
Actions for I uploaded hundreds of forgotten screenshots into Gemini, and the results freaked me out
Do
Vision-Language
Models
See or Guess? Measuring and Reducing Textual-Prior Reliance with a Phrasing-Controlled Benchmark
🧠
LLM Research
Content type:
Academic
arxiv.org
·
14h
14 hours ago
Actions for Do Vision-Language Models See or Guess? Measuring and Reducing Textual-Prior Reliance with a Phrasing-Controlled Benchmark
Are Reasoning
Vision-Language
Models
Robust to Semantic Visual Distractions?
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?
Budget Android phones are finally getting flagship
AI
— if manufacturers don't cut corners everywhere else
🎙️
Speech AI
androidpolice.com
·
4d
4 days ago
·
r/Android
Actions for Budget Android phones are finally getting flagship AI — if manufacturers don't cut corners everywhere else
Cross-Modal
Knowledge Distillation without Paired Data: Theoretical
Foundation
and Algorithm
🧠
LLM Research
Content type:
Academic
arxiv.org
·
14h
14 hours ago
Actions for Cross-Modal Knowledge Distillation without Paired Data: Theoretical Foundation and Algorithm
Textual Supervision Enhances Geospatial Representations in
Vision-Language
Models
🧠
LLM Research
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Textual Supervision Enhances Geospatial Representations in Vision-Language Models
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help