Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🖼️ Multimodal AI
multimodal, vision language models, VLM, image-text models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
186308
posts in
42.7
ms
typomonster/parlor-jarvis
: On-device, real-time multimodal AI. Multilingual voice + vision (en/ko/es/pt/fr) with camera, screen, PDF, and video — runs entirely locally.
✨
Gemini
github.com
·
4d
·
Hacker News
Building Smart Student Engagement
Detector
: An AI-Powered Early Learning Issue Detection System using ML,
NLP
& Multimodal Analytics
💬
NLP
github.com
·
3d
·
DEV
M$^3$-
VQA
: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question
Answering
✨
Gemini
arxiv.org
·
2d
This $80 platform can replace your
entire
AI
stack
🇨🇳
Chinese AI
macworld.com
·
3d
A
layout
engine for image generation in
JavaScript
.
✏️
Code Editors
sone.seanghay.com
·
3d
·
Hacker News
Benchmarking
Complex Multimodal Document Processing
Pipelines
: A Unified Evaluation Framework for Enterprise AI
🔍
Information Extraction
arxiv.org
·
1d
End-2-end
tutorial
on
fine-tuning
, the whole journey
🔌
Embedded Systems
docs.liquid.ai
·
3d
·
r/LocalLLaMA
Three-Step
Nav
: A Hierarchical Global-Local
Planner
for Zero-Shot Vision-and-Language Navigation
🤝
Human-AI Collaboration
arxiv.org
·
1d
NVIDIA
Nemotron
3 Nano
Omni
Powers Multimodal Agent Reasoning in a Single Efficient Open Model
✨
Gemini
developer.nvidia.com
·
2d
·
Hacker News
SWAN
: World-Aware Adaptive Multimodal Networks for Runtime
Variations
✨
Gemini
arxiv.org
·
1d
World2VLM
:
Distilling
World Model Imagination into VLMs for Dynamic Spatial Reasoning
✨
Gemini
arxiv.org
·
1d
The Thinking Pixel:
Recursive
Sparse Reasoning in Multimodal Diffusion
Latents
✨
Gemini
arxiv.org
·
2d
Topology-Aware
Representation Alignment for
Semi-Supervised
Vision-Language Learning
🎯
Alignment Research
arxiv.org
·
1d
EmoTrans
: A Benchmark for Understanding, Reasoning, and Predicting Emotion
Transitions
in Multimodal LLMs
✨
Gemini
arxiv.org
·
3d
Beyond
Accuracy
: Benchmarking Cross-Task
Consistency
in Unified Multimodal Models
✨
Gemini
arxiv.org
·
2d
FASH-iCNN
: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing
✨
Gemini
arxiv.org
·
1d
Vision-Language-Action in Robotics: A Survey of
Datasets
, Benchmarks, and Data
Engines
✨
Gemini
arxiv.org
·
3d
DualFact
+: A Multimodal Fact Verification Framework for
Procedural
Video Understanding
✨
Gemini
arxiv.org
·
2d
Libra-VLA
: Achieving Learning Equilibrium via Asynchronous
Coarse-to-Fine
Dual-System
✨
Gemini
arxiv.org
·
2d
Source-Modality
Monitoring
in Vision-Language Models
✨
Gemini
arxiv.org
·
4d
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help