Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🤖 LLM Inference
Specific
Model Serving, Quantization, vLLM, ONNX Runtime
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
191
posts in
12.7
ms
Majestic Labs Raises $100M for Memory Pooling AI Server
🧠
Memory Allocators
eetimes.com
·
21h
MCP Bridge Part 3: How we made getProcInfo3() agent-readable: hybrid discovery + AI Enrichment
🔮
Speculative Decoding
appfactor.io
·
4h
·
Hacker News
AMD is ready to ship Halo
🚀
Performance
jonpeddie.com
·
2d
A Kubernetes operator for local LLMs across Nvidia and Mac fleets
⚙️
MLOps
llmkube.com
·
6d
·
Hacker News
[AINews] Anthropic raises $965B Series H, releases Opus 4.8 and Dynamic Workflows/ultracode
🚀
Performance
latent.space
·
17h
GPU autoscaling on Kubernetes with KEDA: Building an external scaler
📊
Performance Tools
cncf.io
·
2d
Testing
llama.cpp
PR #21344: Faster MoE Prefill, but MTP Fights Back
📊
Profiling Tools
sleepingrobots.com
·
3d
FuriosaAI partners with Broadcom to build next-generation
inference
platform for the Agentic Era
📡
Edge AI
furiosa.ai
·
2d
grpyc: Up to 8x faster gRPC Python in Rust
⚡
gRPC
grpyc.com
·
2d
·
Hacker News
EAGLE 3.1: Advancing Speculative Decoding Through Collaboration Between the EAGLE Team,
vLLM
, and TorchSpec
🔮
Speculative Decoding
vllm.ai
·
3d
·
Hacker News
A Guide to AI Cold Starts on Cloud
Run
🚀
Performance
cloud.google.com
·
2d
Llama.cpp
now has an official website:
llama.app
🪟
Tauri
llama.app
·
2h
·
Hacker News
Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL
🔌
Hardware-in-the-Loop
huggingface.co
·
2d
CPUs return to the AI core
⚙️
Performance Profiling
jonpeddie.com
·
6d
MMBT-Messy-Model-Bench-Tests/hardware-tests/step3.7-flash-nvfp4-dual-blackwell-2026-05-28
at main ·
Light-Heart-Labs/MMBT-Messy-Model-Bench-Tests
🚀
Performance
github.com
·
14h
·
r/LocalLLaMA
The
LLM
Inference
Optimization:
Quantization
to Speculative Decoding Part 2
🔮
Speculative Decoding
digitalocean.com
·
2d
Fingerprinting
Inference
Systems of Large Language
Models
🧠
LLMs
arxiv.org
·
15h
Local LLMs Are Getting Easier: The Complete Guide (2026)
✍️
Prompt Engineering
sitepoint.com
·
1d
Build high-performance generative AI systems with Strands Agents, NVIDIA NIM, and Amazon Bedrock AgentCore
🎯
AI Agents
aws.amazon.com
·
3d
[AINews] Cognition raises $1B in $26B Series D
📡
Edge AI
latent.space
·
1d
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help