Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
vllm-project/vllm
github.com
·
76w
76 weeks ago
· Covered in 24 articles from 13 sources
Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit
huggingface.co
·
1d
1 day ago
·
r/LocalLLaMA
Actions for Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit
MiniMaxAI/MiniMax-M3
huggingface.co
·
2d
2 days ago
·
r/LocalLLaMA
Actions for MiniMaxAI/MiniMax-M3
Why agentic AI needs an open inference stack
redhat.com
·
5d
5 days ago
Actions for Why agentic AI needs an open inference stack
3-Part Series: LLM Latency in Production (Part 1)
towardsai.net
·
1w
1 week ago
Actions for 3-Part Series: LLM Latency in Production (Part 1)
Learn to optimize, deploy, and benchmark LLMs with vLLM: A New Free Course
developers.redhat.com
·
1w
1 week ago
Actions for Learn to optimize, deploy, and benchmark LLMs with vLLM: A New Free Course
Fast and Efficient LLM Inference with vLLM: A New Course with Deeplearning.ai
vllm.ai
·
1w
1 week ago
·
Hacker News
Actions for Fast and Efficient LLM Inference with vLLM: A New Course with Deeplearning.ai
Improve vLLM Semantic Router accuracy with fine-tuning
developers.redhat.com
·
1w
1 week ago
Actions for Improve vLLM Semantic Router accuracy with fine-tuning
The Community Champions Program
zed.dev
·
1w
1 week ago
Actions for The Community Champions Program
The Roadmap for Mastering LLMOps in 2026
machinelearningmastery.com
·
1w
1 week ago
Actions for The Roadmap for Mastering LLMOps in 2026
Introducing OpenRL: A self-hosted post-training API for fine-tuning LLMs
opensource.googleblog.com
·
1w
1 week ago
·
Blogger
Actions for Introducing OpenRL: A self-hosted post-training API for fine-tuning LLMs
Anthropic raises $65B in Series H at a $965B post-money valuation, releases Opus 4.8 and Dynamic Workflows
news.smol.ai
·
2w
2 weeks ago
Actions for Anthropic raises $65B in Series H at a $965B post-money valuation, releases Opus 4.8 and Dynamic Workflows
Structured LLM Outputs
dottxt-ai.github.io
·
2w
2 weeks ago
·
Hacker News
Actions for Structured LLM Outputs
EAGLE 3.1: Advancing Speculative Decoding Through Collaboration Between the EAGLE Team, vLLM, and TorchSpec
vllm.ai
·
2w
2 weeks ago
·
Hacker News
Actions for EAGLE 3.1: Advancing Speculative Decoding Through Collaboration Between the EAGLE Team, vLLM, and TorchSpec
Gemma 4 dense by default: why your local agent doesn't want the MoE
dev.to
·
2w
2 weeks ago
·
DEV
Actions for Gemma 4 dense by default: why your local agent doesn't want the MoE
Multi-Head Latent Attention (MLA)
dev.to
·
2w
2 weeks ago
·
DEV
Actions for Multi-Head Latent Attention (MLA)
How to prevent AI inference stack silent failures
developers.redhat.com
·
3w
3 weeks ago
Actions for How to prevent AI inference stack silent failures
RL Doesn't Work on Slurm
blog.skypilot.co
·
3w
3 weeks ago
·
Hacker News
Actions for RL Doesn't Work on Slurm
llmfan46/MiniMax-M2.7-ultra-uncensored-heretic-GGUF
huggingface.co
·
3w
3 weeks ago
·
r/LocalLLaMA
Actions for llmfan46/MiniMax-M2.7-ultra-uncensored-heretic-GGUF
Gemma 4 Didn't Just Get Smarter. It Became a Different Kind of Model. Here's What the Agentic Numbers Actually Mean.
dev.to
·
3w
3 weeks ago
·
DEV
Actions for Gemma 4 Didn't Just Get Smarter. It Became a Different Kind of Model. Here's What the Agentic Numbers Actually Mean.
The AI stack every developer will depend on in 2026
dev.to
·
3w
3 weeks ago
·
DEV
Actions for The AI stack every developer will depend on in 2026
Apple M5 Max vs NVIDIA: AI Performance Verdict (2026)
skorppio.com
·
3w
3 weeks ago
Actions for Apple M5 Max vs NVIDIA: AI Performance Verdict (2026)
What GenAI Actually Costs in Production
dev.to
·
3w
3 weeks ago
·
DEV
Actions for What GenAI Actually Costs in Production
DreamFast/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark
huggingface.co
·
3w
3 weeks ago
Actions for DreamFast/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark
Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)
huggingface.co
·
4w
4 weeks ago
·
r/LocalLLaMA
Actions for Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help