Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
⚡ Fast AI Inference
Cerebras, Groq, fast LLM tokens
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
125
posts in
41.9
ms
How the hell is
Groq
raising more money?
🧬
Mythos
zach.be
·
3d
·
Hacker News
Free
vLLM
Course:
Inference
, Compression, Benchmarks
🧠
Inference Serving
deeplearning.ai
·
2d
·
Hacker News
,
r/selfhosted
huawei-csl/KVarN: KVarN is a native
vLLM
KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
🏗️
LLM Infrastructure
Code
github.com
·
15h
·
Hacker News
Fast
and Efficient
LLM
Inference
with vLLM: A New Course with Deeplearning.ai
🧠
Inference Serving
Blog
vllm.ai
·
2d
·
Hacker News
NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300
tokens
per
second
on benchmar...
🗄️
Web Datasets
digg.com
·
18h
Serving
vLLM
for
LLM
Inference
🏗️
LLM Infrastructure
Blog
beam.cloud
·
4d
DriftSched: Adaptive QoS-Aware Scheduling under Runtime
Token
Drift for Multi-Tenant GPU
Inference
🧠
Inference Serving
Academic
arxiv.org
·
2d
What Actually Happens When You Send a Prompt to Claude A Full Breakdown
🪄
Prompt Engineering
pub.towardsai.net
·
1h
Intel's mysterious new datacenter GPU is what Nvidia's Rubin CPX nearly was
🖥️
Hardware Architecture
theregister.com
·
18h
Making Local
LLM
Go Brrr
🤖
AI
seanpedersen.github.io
·
1d
Sources: ByteDance has partnered with chipmaker InnoStar to develop an
AI
inference
chip modeled after
Groq
's LPUs, which are built to run
AI
models at low cost...
🏗️
LLM Infrastructure
techmeme.com
·
6d
mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model
vLLM
and sglang backends with zero external dependencies
🤖
AI
Code
github.com
·
6h
·
Hacker News
Experimenting with TPUs, GKE Managed DRANET, and Multi-cluster
Inference
Gateway
🌍
Distributed Systems
Blog
cloud.google.com
·
2d
Step 3.7 Flash – 198B-A11B MoE vision-language model
🤖
AI
huggingface.co
·
5d
·
Hacker News
Nemotron 3 Ultra now available on
AI
Gateway
🪄
Prompt Engineering
vercel.com
·
1d
Introducing Granite Libraries and Project Granite Switch
🏗️
LLM Infrastructure
Blog
research.ibm.com
·
18h
Llama.cpp
now has an official website:
llama.app
🤖
AI
llama.app
·
6d
·
Hacker News
Qwen3.7 Plus - Intelligence, Performance & Price Analysis
💰
Tokenomics
artificialanalysis.ai
·
1d
·
Hacker News
Build Personal
AI
Agents on Windows PCs with New Tools from Microsoft and Nvidia
🤖
AI
Blog
developer.nvidia.com
·
2d
·
Hacker News
Lodestar: An Online-Learning
LLM
Inference
Router
🏗️
LLM Infrastructure
Academic
arxiv.org
·
3d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help