Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
⚡ Fast AI Inference
Cerebras, Groq, fast LLM tokens
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
218
posts in
27.4
ms
Nvidia paid
Groq
$20 billion and took its top engineers. Now
Groq
is raising $650 million for what’s left.
🇨🇳
Chinese AI
thenextweb.com
·
5d
Part 3 — Implementation/Engine-Level: Choosing the Runtime That Gives You These for Free
🏗️
LLM Infrastructure
towardsai.net
·
1d
huawei-csl/KVarN: KVarN is a native
vLLM
KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
🏗️
LLM Infrastructure
Code
github.com
·
17h
·
Hacker News
Free
vLLM
Course:
Inference
, Compression, Benchmarks
🧠
Inference Serving
deeplearning.ai
·
2d
·
Hacker News
,
r/selfhosted
Build a Medical Report Analyzer on Dedicated
Inference
with Python
🇨🇳
Chinese AI
digitalocean.com
·
21h
How the hell is
Groq
raising more money?
🧬
Mythos
zach.be
·
3d
·
Hacker News
Fast
and Efficient
LLM
Inference
with vLLM: A New Course with Deeplearning.ai
🧠
Inference Serving
Blog
vllm.ai
·
2d
·
Hacker News
NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300
tokens
per
second
on benchmar...
🗄️
Web Datasets
digg.com
·
19h
LLM
Inference
Engineering Room — Part 3: The Orchestration Layer
🧠
LLM Inference
Blog
vimal-dwarampudi.medium.com
·
1d
Serving
vLLM
for
LLM
Inference
🏗️
LLM Infrastructure
Blog
beam.cloud
·
4d
Scale On-Prem
AI
with Foundry Local on Azure Local: Multi-Node
Inference
and
vLLM
Support
🧠
Inference Serving
techcommunity.microsoft.com
·
2d
New comment by tjsawyer in "Ask HN: Who wants to be hired? (June 2026)"
🤖
AI
Discussion
news.ycombinator.com
·
19h
·
Hacker News
After Nvidia’s $20B not-aqui-hire,
AI
chip startup
Groq
reportedly raising $650M
🖥
GPUs
techcrunch.com
·
6d
mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model
vLLM
and sglang backends with zero external dependencies
🤖
AI
Code
github.com
·
8h
·
Hacker News
DriftSched: Adaptive QoS-Aware Scheduling under Runtime
Token
Drift for Multi-Tenant GPU
Inference
🧠
Inference Serving
Academic
arxiv.org
·
2d
Accelerate autoscaling
inference
in Red Hat
AI
with Everpure
🏗️
LLM Infrastructure
redhat.com
·
3d
Intel's mysterious new datacenter GPU is what Nvidia's Rubin CPX nearly was
🖥️
Hardware Architecture
theregister.com
·
19h
How attackers are gaining access to
LLM
inference
🤖
AI
Blog
intezer.com
·
1d
Speculators v0.5.0: DFlash support and online training
🏗️
LLM Infrastructure
developers.redhat.com
·
1d
How to Run Gemma 4 12B Locally - The Best
AI
For Consumer Laptops
🤖
AI
Video
youtube.com
·
22h
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help