Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Back to article
vllm-project/vllm
(opens in new tab)
24
articles covering this post
github.com
·
77w
77 weeks ago
·
DEV
,
Hacker News
·
Open original
(opens in new tab)
Save
Love
Like
Dislike
|
Add interest
Feeds
Share
|
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block
Add interest
Show Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Covered in 24 articles
Gemma 4 dense by default: why your local agent doesn't want the MoE
dev.to
·
2w
2 weeks ago
·
DEV
Actions for Gemma 4 dense by default: why your local agent doesn't want the MoE
Multi-Head Latent Attention (MLA)
dev.to
·
2w
2 weeks ago
·
DEV
Actions for Multi-Head Latent Attention (MLA)
Gemma 4 Didn't Just Get Smarter. It Became a Different Kind of Model. Here's What the Agentic Numbers Actually Mean.
dev.to
·
3w
3 weeks ago
·
DEV
Actions for Gemma 4 Didn't Just Get Smarter. It Became a Different Kind of Model. Here's What the Agentic Numbers Actually Mean.
The AI stack every developer will depend on in 2026
dev.to
·
3w
3 weeks ago
·
DEV
Actions for The AI stack every developer will depend on in 2026
What GenAI Actually Costs in Production
dev.to
·
3w
3 weeks ago
·
DEV
Actions for What GenAI Actually Costs in Production
The Community Champions Program
zed.dev
·
1w
1 week ago
Actions for The Community Champions Program
Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit
huggingface.co
·
1d
1 day ago
·
r/LocalLLaMA
Actions for Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit
MiniMaxAI/MiniMax-M3
huggingface.co
·
2d
2 days ago
·
r/LocalLLaMA
Actions for MiniMaxAI/MiniMax-M3
llmfan46/MiniMax-M2.7-ultra-uncensored-heretic-GGUF
huggingface.co
·
3w
3 weeks ago
·
r/LocalLLaMA
Actions for llmfan46/MiniMax-M2.7-ultra-uncensored-heretic-GGUF
DreamFast/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark
huggingface.co
·
3w
3 weeks ago
Actions for DreamFast/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark
Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)
huggingface.co
·
4w
4 weeks ago
·
r/LocalLLaMA
Actions for Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)
Learn to optimize, deploy, and benchmark LLMs with vLLM: A New Free Course
developers.redhat.com
·
1w
1 week ago
Actions for Learn to optimize, deploy, and benchmark LLMs with vLLM: A New Free Course
Improve vLLM Semantic Router accuracy with fine-tuning
developers.redhat.com
·
1w
1 week ago
Actions for Improve vLLM Semantic Router accuracy with fine-tuning
How to prevent AI inference stack silent failures
developers.redhat.com
·
3w
3 weeks ago
Actions for How to prevent AI inference stack silent failures
Structured LLM Outputs
dottxt-ai.github.io
·
2w
2 weeks ago
·
Hacker News
Actions for Structured LLM Outputs
Apple M5 Max vs NVIDIA: AI Performance Verdict (2026)
skorppio.com
·
3w
3 weeks ago
Actions for Apple M5 Max vs NVIDIA: AI Performance Verdict (2026)
Why agentic AI needs an open inference stack
redhat.com
·
5d
5 days ago
Actions for Why agentic AI needs an open inference stack
The Roadmap for Mastering LLMOps in 2026
machinelearningmastery.com
·
1w
1 week ago
Actions for The Roadmap for Mastering LLMOps in 2026
Anthropic raises $65B in Series H at a $965B post-money valuation, releases Opus 4.8 and Dynamic Workflows
news.smol.ai
·
2w
2 weeks ago
Actions for Anthropic raises $65B in Series H at a $965B post-money valuation, releases Opus 4.8 and Dynamic Workflows
RL Doesn't Work on Slurm
blog.skypilot.co
·
3w
3 weeks ago
·
Hacker News
Actions for RL Doesn't Work on Slurm
3-Part Series: LLM Latency in Production (Part 1)
towardsai.net
·
1w
1 week ago
Actions for 3-Part Series: LLM Latency in Production (Part 1)
Introducing OpenRL: A self-hosted post-training API for fine-tuning LLMs
opensource.googleblog.com
·
1w
1 week ago
·
Blogger
Actions for Introducing OpenRL: A self-hosted post-training API for fine-tuning LLMs
Fast and Efficient LLM Inference with vLLM: A New Course with Deeplearning.ai
vllm.ai
·
1w
1 week ago
·
Hacker News
Actions for Fast and Efficient LLM Inference with vLLM: A New Course with Deeplearning.ai
EAGLE 3.1: Advancing Speculative Decoding Through Collaboration Between the EAGLE Team, vLLM, and TorchSpec
vllm.ai
·
2w
2 weeks ago
·
Hacker News
Actions for EAGLE 3.1: Advancing Speculative Decoding Through Collaboration Between the EAGLE Team, vLLM, and TorchSpec
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help