Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Systems-level optimizations for LLM serving
🔧 Systems-level optimizations for LLM serving
Specific
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
139
posts in
7.0
ms
High-end Hitachi Vantara arrays and Nvidia AI support
Â
🤖
Agents using LLMs
Â
Content type:
News
blocksandfiles.com
·
11h
11 hours ago
Actions for High-end Hitachi Vantara arrays and Nvidia AI support
Qwen 3.6 27B AutoRound GGUF, need your feedback
Â
✨
Model optimizations in LLMs
huggingface.co
·
2d
2 days ago
·
r/LocalLLaMA
Actions for Qwen 3.6 27B AutoRound GGUF, need your feedback
High Bandwidth Flash | A New Memory for AI Data Centers and Edge Computing | Sandisk
Â
📊
AI Performance Profiling
ncnonline.net
·
2d
2 days ago
Actions for High Bandwidth Flash | A New Memory for AI Data Centers and Edge Computing | Sandisk
1-bit and 1.58 bit
LLM
Benchmarking on Jetson Orin Nano Super | Bonsai LM
Â
ðŸ§
Large Language Models (LLMs)
smolhub.com
·
3d
3 days ago
·
r/LocalLLaMA
Actions for 1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
How to Measure Time To First Token (TTFT) in AI
Systems
Â
💬
Prompt optimizations for LLM serving
qainsights.com
·
5d
5 days ago
·
Hacker News
Actions for How to Measure Time To First Token (TTFT) in AI Systems
VIA-SD: Verification via
Intra-Model
Routing for
Speculative
Decoding
Â
💬
Prompt optimizations for LLM serving
Â
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for VIA-SD: Verification via Intra-Model Routing for Speculative Decoding
Gemma 4 QAT
models
:
Optimizing
model compression for mobile and laptop efficiency
Â
✨
Model optimizations in LLMs
Â
Content type:
News
Â
Content type:
Blog
blog.google
·
6d
6 days ago
·
Hacker News
Actions for Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
Machinic Psychopharmacology: Do LLMs Self-Medicate?
Â
🚀
LLM serving frameworks
lesswrong.com
·
1d
1 day ago
·
Hacker News
Actions for Machinic Psychopharmacology: Do LLMs Self-Medicate?
China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)
Â
ðŸ§
Large Language Models (LLMs)
Â
Content type:
News
decrypt.co
·
3d
3 days ago
·
Hacker News
Actions for China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)
Making Local
LLM
Fast
Â
ðŸ§
Large Language Models (LLMs)
bogdan.nimblex.net
·
2d
2 days ago
·
Hacker News
Actions for Making Local LLM Fast
libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA
KV-cache
. Weights available on Hugging Face.
Â
📊
AI Performance Profiling
Â
Content type:
Code
github.com
·
2d
2 days ago
Actions for libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.
MiMo-v2.5-Pro-UltraSpeed: 1T
model
with 1000 TPS
Â
✨
Model optimizations in LLMs
Â
Content type:
Blog
mimo.xiaomi.com
·
4d
4 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS
Claude Fable 5 🚀, Gemini 3.5 Live Translate 📱, scaling test time compute 📈
Â
🤖
Agents using LLMs
tldr.tech
·
2d
2 days ago
Actions for Claude Fable 5 🚀, Gemini 3.5 Live Translate 📱, scaling test time compute 📈
Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent
Â
ðŸ§
Large Language Models (LLMs)
Â
Content type:
Blog
dnhkng.github.io
·
3d
3 days ago
Actions for Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent
Youssof Altoukhi (@Youssofal_)
Â
ðŸ§
Large Language Models (LLMs)
xcancel.com
·
4d
4 days ago
·
r/LocalLLaMA
Actions for Youssof Altoukhi (@Youssofal_)
RKSC: Reasoning-Aware
KV
Cache
Sharing and Confident Early Exit for Multi-Step
LLM
Inference
Â
🚀
LLM serving frameworks
Â
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference
Nvidia DGX Spark GB10 – AI
Models
and Guide with
vLLM
and Autonomous Script
Â
🚀
LLM serving frameworks
Â
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script
LLM
Research Papers: The 2026 List (January to May)
Â
ðŸ§
Large Language Models (LLMs)
Â
Content type:
News
magazine.sebastianraschka.com
·
5d
5 days ago
·
Hacker News
Actions for LLM Research Papers: The 2026 List (January to May)
Rebellions Bets on Memory-Centric Architecture as it Weighs IPO Options
Â
âš¡
Real-time AI Systems
Â
Content type:
News
eetimes.com
·
4h
4 hours ago
Actions for Rebellions Bets on Memory-Centric Architecture as it Weighs IPO Options
iOS Security SDKs & Audits for Production Teams
Â
✨
Model optimizations in LLMs
Â
Content type:
Discussion
sentinelden.com
·
5h
5 hours ago
·
Hacker News
Actions for iOS Security SDKs & Audits for Production Teams
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help