Multimodal AI

Feeds to Scour
SubscribedAll
Scoured 311 posts in 7.0 ms

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

 👁️Computer Vision  Content type: Academic
arxiv.org·

NVlabs/Eagle: Eagle: Frontier Vision-Language Models with Data-Centric Strategies

 🔍Fine-Grained Classification  Content type: Code
github.com·

Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit

 🧠LLMs
huggingface.co··r/LocalLLaMA

A generalist biomedical vision-language model via multi-CLIP knowledge distillation

 🏷️Label Noise  Content type: Academic
nature.com·

SpaceX IPO hype is massive — and especially dangerous for investors over 50

 🛰️Geospatial AI
marketwatch.com·

Vibe Coding Specificity Foundation Models

 🧠LLMs  Content type: Academic
biorxiv.org·

A new chapter of efficient foundation models for medical imaging

 ⚙️MLOps

OpenCV 5.0 Released With Rewritten DNN Engine, Built-In LLM & VLM Support

 👁️Computer Vision
phoronix.com··Hacker News

Can robots read the room?

 🧠LLMs  Content type: News  Content type: Academic
news.cornell.edu·

OpenCV 5 release - New DNN engine with enhanced ONNX and LLM/VLM support, Intel, Arm, and RISC-V hardware optimizations - CNX Software

 👁️Computer Vision  Content type: News
cnx-software.com·

openpilot 0.11.1

 ✍️Prompt Engineering  Content type: Blog
blog.comma.ai·

NVIDIA's Cosmos 3: The World's First Fully Open AI Omnimodel

 🤖AI Agents  Content type: News
aimagazine.com·

Adapting Vision-Language Models from Iconic to Inclusive for Multi-Label Recognition Without Labels

 🏷️Label Noise  Content type: Academic
arxiv.org·

ApertureLab · Synthetic Aperture Sonar Simulator

 👁️Computer Vision
gergltd.com··Hacker News

OpenCV 5 Is Here: The Biggest Leap in Years for Computer Vision

 👁️Computer Vision

Disquiet Junto Project 0754: The Blip

 ✍️Prompt Engineering
llllllll.co·

Apple Reveals New AI Architecture Built Around Google Gemini Models

 🛰️Geospatial AI  Content type: News
macrumors.com··Hacker News

dimitrisdimitrov5-blip/Phantomix: The open-source AI browser agent. Free alternative to OpenAI Operator.

 🔌MCP  Content type: Code
github.com··Hacker News

Mbodi AI (YC P25) Is Hiring Founding Machine Learning Engineer (Robotics)

 ⚙️MLOps

MSUE: Multi-Modal Soccer Understanding Expert

 🔍Fine-Grained Classification  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help