Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Open Source AI
🌐 Open Source AI
open source LLM, Llama, Mistral, open weights model
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
541
posts in
6.4
ms
Google fills out the middle with the
Gemma
4 12B
🎨
AI UX
jonpeddie.com
·
3d
3 days ago
Actions for Google fills out the middle with the Gemma 4 12B
12B
Gemma
4 QAT Deployment with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI
⚙️
LLMOps
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for 12B Gemma 4 QAT Deployment with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI
Unsloth Minimax M3 GGUF
💻
AI Engineering
huggingface.co
·
13h
13 hours ago
·
r/LocalLLaMA
Actions for Unsloth Minimax M3 GGUF
Mistral
reportedly seeking $3.5B funding round amid physics
AI
push
💻
AI Engineering
Content type:
Video
siliconangle.com
·
4h
4 hours ago
Actions for Mistral reportedly seeking $3.5B funding round amid physics AI push
massimo92/spark: CLI tool for serving LLMs with
vLLM
on NVIDIA DGX Spark. One file, zero friction.
💻
AI Engineering
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for massimo92/spark: CLI tool for serving LLMs with vLLM on NVIDIA DGX Spark. One file, zero friction.
From Chatbot Hallucinations to Deterministic Agents: Forcing Local LLMs to Run Production-Grade…
⚙️
LLMOps
Content type:
Blog
medium.com
·
15h
15 hours ago
Actions for From Chatbot Hallucinations to Deterministic Agents: Forcing Local LLMs to Run Production-Grade…
You don't need Copilot for code completion, try this instead
🔍
LLM Tracing
mistral.ai
·
4d
4 days ago
·
r/GithubCopilot
·
Cited by 1 article
Actions for You don't need Copilot for code completion, try this instead
Ollama
0.30 GPU Boost: Faster local Qwen
inference
on NVIDIA
🔧
MLOps
everylocalai.com
·
2d
2 days ago
·
DEV
Actions for Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA
Cohere’s North Mini Code Lets Devs Stack Their Own
AI
🎨
AI UX
devops.com
·
10h
10 hours ago
Actions for Cohere’s North Mini Code Lets Devs Stack Their Own AI
DiffusionGemma: 4x Faster Text Generation
⚙️
LLMOps
Content type:
News
Content type:
Blog
21
articles covering this post
blog.google
·
2d
2 days ago
·
Hacker News
,
r/LocalLLaMA
,
r/singularity
·
Cited by 21 articles
Actions for DiffusionGemma: 4x Faster Text Generation
Mistral
is rumored to be raising €3B at €20 valuation
🧠
LLMs
techcrunch.com
·
10h
10 hours ago
·
Cited by 1 article
Actions for Mistral is rumored to be raising €3B at €20 valuation
Google Shrank
Gemma
4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
💻
AI Engineering
Content type:
Blog
towardsai.net
·
4d
4 days ago
Actions for Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
Ollama
's highest performance on Apple Silicon yet with MLX
✍️
Prompt Engineering
Content type:
Blog
ollama.com
·
2d
2 days ago
Actions for Ollama's highest performance on Apple Silicon yet with MLX
Lowest-Cost
LLM
Inference
: The Complete
OpenRouter
Guide
⚙️
LLMOps
Content type:
Blog
Content type:
Discussion
Content type:
Tutorial
openrouter.ai
·
12h
12 hours ago
Actions for Lowest-Cost LLM Inference: The Complete OpenRouter Guide
MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent
🔍
LLM Tracing
Content type:
Blog
bric.pe.kr
·
4d
4 days ago
·
DEV
·
Cited by 1 article
Actions for MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent
How to Run an
LLM
Locally: Ultimate Guide to Local
AI
2026
🧠
LLMs
Content type:
Blog
cswithsanjay.blogspot.com
·
1d
1 day ago
Actions for How to Run an LLM Locally: Ultimate Guide to Local AI 2026
Modular: Day Zero: MiniMax M3
Open
Weights
on Modular Cloud
🔍
LLM Tracing
Content type:
Blog
modular.com
·
10h
10 hours ago
Actions for Modular: Day Zero: MiniMax M3 Open Weights on Modular Cloud
Inferoa
AI
harness claimed 90% cache savings. We ran it and measured 97.8%
⚙️
LLMOps
zozo123.github.io
·
2d
2 days ago
·
Hacker News
Actions for Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%
Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial
LLM
Backends
💻
AI Engineering
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends
AMD's Lemonade SDK For Local
AI
Adds NVIDIA CUDA Support
💻
AI Engineering
phoronix.com
·
2d
2 days ago
·
r/artificial
·
Cited by 1 article
Actions for AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help