Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Fast AI Inference
⚡ Fast AI Inference
Cerebras, Groq, fast LLM tokens
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
112
posts in
45.9
ms
🏗️
LLM Infrastructure
groq.com
·
3d
3 days ago
Groq
Raises Another $650M
Covered by
6 sources
See all sources covering this story
including
TechCrunch
,
TNW | Artificial-Intelligence
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Groq Raises Another $650M
🤖
AI
NVIDIA Technical Blog
·
1d
1 day ago
Boost
Inference
Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding
Covers
3 stories
See all stories this covers
including
NVIDIA/TensorRT-LLM
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding
🏗️
LLM Infrastructure
GitHub
·
11h
11 hours ago
For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro)
Discussed on
r/LocalLLaMA
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro)
⚡
Hardware Acceleration
The Register
·
2h
2 hours ago
ZTE builds a TCO-optimal
AI
factory to fuel
token
economy
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for ZTE builds a TCO-optimal AI factory to fuel token economy
🏗️
LLM Infrastructure
blog.skypilot.co
·
1d
1 day ago
SkyPilot Endpoints: Production-Ready
Inference
on Every Cluster You Own
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for SkyPilot Endpoints: Production-Ready Inference on Every Cluster You Own
🔓
Open Source AI
Anyscale blog posts
·
6d
6 days ago
High Performance Distributed
Inference
with Ray Serve
LLM
Covered by
Google Cloud Blog
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for High Performance Distributed Inference with Ray Serve LLM
🏗️
LLM Infrastructure
primeintellect.ai
·
2d
2 days ago
RL at 1T Scale: prime-rl Performance Deep Dive
Covers
6 stories
See all stories this covers
including
Kimi K2.7-Code: open-source coding model with better token efficiency
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for RL at 1T Scale: prime-rl Performance Deep Dive
🧩
MoE
Modal
·
1d
1 day ago
Achieve state-of-the-art
inference
latencies
with speculative decoding
Covers
DFlash: Block Diffusion for Flash Speculative Decoding
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Achieve state-of-the-art inference latencies with speculative decoding
🤖
AI
cerebras.ai
·
6d
6 days ago
Gemma 4 on
Cerebras
—The
Fastest
Inference
is Now Multimodal
Covers
Home | ArtificialAnalysis.ai
Covered by
habr.com
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Gemma 4 on Cerebras—The Fastest Inference is Now Multimodal
🏗️
LLM Infrastructure
Baseten
·
2d
2 days ago
We built the
fastest
API for GLM-5.2 (280 TPS)
Covers
GLM-5.2 (6 minute read)
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for We built the fastest API for GLM-5.2 (280 TPS)
🏗️
LLM Infrastructure
GitHub
·
2h
2 hours ago
Show HN: ParseHawk – 100% Local Document
AI
with API, CLI, and Web UI
Covers
2 stories
See all stories this covers
including
Installation
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Show HN: ParseHawk – 100% Local Document AI with API, CLI, and Web UI
🔓
Open Source AI
portal.neuralwatt.com
·
3d
3 days ago
Neuralwatt: Energy-based pricing for
AI
inference
. Efficient prompts cost less
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Neuralwatt: Energy-based pricing for AI inference. Efficient prompts cost less
🔓
Open Source AI
IBM Research
·
1d
1 day ago
Running
AI
on mixed hardware for speed and affordability
Covers
Introduction to llm-d Open-source Kubernetes-native Framework for Distributed LLM Inference | Ep 140 #cloudnativefm
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Running AI on mixed hardware for speed and affordability
🏗️
LLM Infrastructure
Towards AI
·
1d
1 day ago
Stop Crashing and Start Cooking with
vLLM
on AMD and Lemonade Server
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Stop Crashing and Start Cooking with vLLM on AMD and Lemonade Server
🆕
New AI
Hugging Face
·
6d
6 days ago
225B-A23B
Covered by
mail.bycloud.ai
,
news.smol.ai
Discussed on
r/LocalLLaMA
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 225B-A23B
🧠
LLM Inference
arXiv
·
2d
2 days ago
Recency/Frequency Adaptive KV Caching for Large Language Model Serving
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Recency/Frequency Adaptive KV Caching for Large Language Model Serving
⚡
Performance
graphsignal.com
·
3d
3 days ago
CUDA Profiler for Production
Inference
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for CUDA Profiler for Production Inference
🏗️
LLM Infrastructure
GitHub
·
1d
1 day ago
Generate
per-session
LoRA adapters in <1s for agentic
inference
efficiency
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Generate per-session LoRA adapters in <1s for agentic inference efficiency
⚡
Performance
GitHub
·
1d
1 day ago
Show HN: CUDA Profiler for Production
Inference
Covered by
tldr.tech
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Show HN: CUDA Profiler for Production Inference
🏗️
LLM Infrastructure
GitHub
·
6d
6 days ago
Profile(v2.1.4) physics-aware optimizer for
vLLM
(31→470
tok/s
on A100)
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Profile(v2.1.4) physics-aware optimizer for vLLM (31→470 tok/s on A100)
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report