Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Multimodal AI
🧠 Multimodal AI
multimodal, vision-language, image-text, foundation model
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
191
posts in
6.7
ms
How 10
AI
Startups are
Grounding
AI
in the Real World with Overture
⚙️
AI Tooling
Content type:
Blog
overturemaps.org
·
6d
6 days ago
Actions for How 10 AI Startups are Grounding AI in the Real World with Overture
One Stone, Three Birds: Self-adaptive Optimal Transport for
Multi-VLM
Selection, Adaptation, and Ensembling
🔧
LoRA
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for One Stone, Three Birds: Self-adaptive Optimal Transport for Multi-VLM Selection, Adaptation, and Ensembling
Visual
AI
tracks nearly 100 wildlife species to improve conservation
🎥
AI Video Tools
phys.org
·
6d
6 days ago
Actions for Visual AI tracks nearly 100 wildlife species to improve conservation
not much happened today | AINews
🚀
Model Releases
news.smol.ai
·
5d
5 days ago
Actions for not much happened today | AINews
Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data
🚀
Model Releases
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data
Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB
🔧
LoRA
Content type:
Blog
ziraph.com
·
5d
5 days ago
·
Hacker News
Actions for Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB
Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
⚙️
AI Tooling
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
docs: document pdf web tool tests · openclaw/openclaw@1e8609a
📝
Builder Writeups
Content type:
Code
github.com
·
6d
6 days ago
Actions for docs: document pdf web tool tests · openclaw/openclaw@1e8609a
Less-relevant results
Human-driven sea-level rise has quadrupled the frequency of coastal sea-level extremes since 1900
📝
Builder Writeups
Content type:
Academic
nature.com
·
1d
1 day ago
Actions for Human-driven sea-level rise has quadrupled the frequency of coastal sea-level extremes since 1900
AI
, Orientalism, and “Calm” Art
🖼️
Image Generation
nightingaledvs.com
·
3d
3 days ago
Actions for AI, Orientalism, and “Calm” Art
CLASP:
Language-Driven
Robot Skill Selection and Composition using Task-Parameterized Learning
🚀
Model Releases
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning
Design a Knowledge Q&A System
⚙️
AI Tooling
newsletter.systemdesign.one
·
6d
6 days ago
Actions for Design a Knowledge Q&A System
Why We Have No Idea How to Classify
Language
Models
🚀
Model Releases
Content type:
Blog
medium.com
·
6d
6 days ago
Actions for Why We Have No Idea How to Classify Language Models
OmniGameArena: A Unified UE5 Benchmark for
VLM
Game Agents with Improvement Dynamics
⚙️
AI Tooling
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics
Self-driving legs to walk me to the office
🖥️
ComfyUI
interconnected.org
·
6d
6 days ago
Actions for Self-driving legs to walk me to the office
Vision
Language
Model
Helps Private Information De-Identification in
Vision
Data
🎬
Generative Video
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Vision Language Model Helps Private Information De-Identification in Vision Data
Pinterest's $4 billion AWS deal and the cost of
AI-powered
discovery
✨
Creative AI
ppc.land
·
5d
5 days ago
Actions for Pinterest's $4 billion AWS deal and the cost of AI-powered discovery
Decoding Pedestrian Crossing Intention from Egocentric
Vision
via
Vision
Language
Models
⚙️
AI Tooling
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models
One Token per
Multimodal
Evidence:
Latent
Memory for Resource-Constrained QA
🔧
LoRA
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA
AgenticNav: Zero-Shot
Vision-and-Language
Navigation as a Tool-Calling Harness
🎥
AI Video Tools
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for AgenticNav: Zero-Shot Vision-and-Language Navigation as a Tool-Calling Harness
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help