Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
KV Cache
⚡ KV Cache
Specific
KV cache, key-value cache, attention cache, LLM inference cache
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
183
posts in
32.3
ms
GLM-5.2: Z.ai Ships 1M-Token Coding Model With Zero Benchmarks
💻
Software Engineering
Content type:
Blog
wowhow.cloud
·
3d
3 days ago
·
DEV
·
Covers:
DEV Community
Actions for GLM-5.2: Z.ai Ships 1M-Token Coding Model With Zero Benchmarks
12B Gemma 4 QAT Deployment with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI
🧠
LLM Inference
Content type:
Blog
medium.com
·
6d
6 days ago
Actions for 12B Gemma 4 QAT Deployment with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI
Mlx-optiq: per-layer mixed-precision
LLM
quantization for Apple Silicon
💬
LLMs
Content type:
Video
Content type:
Discussion
Content type:
Tutorial
mlx-optiq.com
·
4d
4 days ago
·
Hacker News
·
Cited by 2 articles
Actions for Mlx-optiq: per-layer mixed-precision LLM quantization for Apple Silicon
PolyKV: Heterogeneous Retention and Allocation for
KV
Cache
Compression
🔢
Vector DBs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for PolyKV: Heterogeneous Retention and Allocation for KV Cache Compression
Show HN: Quant Picker – which GGUF file fits your model and machine
💬
LLMs
vettedconsumer.com
·
5d
5 days ago
·
Hacker News
Actions for Show HN: Quant Picker – which GGUF file fits your model and machine
Rebellions bets on memory-centric AI
inference
🧠
LLM Inference
jonpeddie.com
·
21h
21 hours ago
Actions for Rebellions bets on memory-centric AI inference
zai-org/GLM-5.2 is here!
🧠
LLM Inference
9
articles covering this post
huggingface.co
·
1d
1 day ago
·
Hacker News
,
Hacker News
,
r/LocalLLaMA
·
Cited by 9 articles
·
Covers 7 stories
Actions for zai-org/GLM-5.2 is here!
Inference
cost at scale with napkin math (13 minute read)
🧠
LLM Inference
Content type:
Blog
injuly.in
·
3d
3 days ago
·
Cited by 1 article
·
Covers:
Fermi Problem
Actions for Inference cost at scale with napkin math (13 minute read)
Native
Inference
Engine for macOS 14 or newer
🧠
LLM Inference
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for Native Inference Engine for macOS 14 or newer
Inside the
LLM
KV
Cache
: The Hidden System Behind Fast AI Inference
🧠
LLM Inference
Content type:
Blog
fardinkai.medium.com
·
4d
4 days ago
Actions for Inside the LLM KV Cache: The Hidden System Behind Fast AI Inference
I gave my gaming PC and phone the same local
LLM
tasks, and only one of them is still in my daily rotation
🧠
LLM Inference
xda-developers.com
·
7h
7 hours ago
Actions for I gave my gaming PC and phone the same local LLM tasks, and only one of them is still in my daily rotation
vLLM
Transformers Backend: Bridging Hugging Face Compatibility and High-Performance
Inference
🧠
LLM Inference
Content type:
Blog
odsc.medium.com
·
6d
6 days ago
Actions for vLLM Transformers Backend: Bridging Hugging Face Compatibility and High-Performance Inference
SMEPilot: Characterizing and Optimizing
LLM
Inference
with Scalable Matrix Extensions
🧠
LLM Inference
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for SMEPilot: Characterizing and Optimizing LLM Inference with Scalable Matrix Extensions
Running Local LLMs With Ollama For Private Development
🧠
LLM Inference
Content type:
Tutorial
nazarboyko.com
·
3d
3 days ago
·
DEV
Actions for Running Local LLMs With Ollama For Private Development
Google OpenRL Tames AI Model Tuning, Kubernetes-Style
🔧
MLOps
cloudnativenow.com
·
16h
16 hours ago
·
Covers:
Best place for learning Kubernetes?
,
sgl-project/sglang
+3 more
Actions for Google OpenRL Tames AI Model Tuning, Kubernetes-Style
All sorts of famous
Attention
Layers
🧠
LLM Inference
Content type:
Blog
harsh-ps-2003.bearblog.dev
·
5d
5 days ago
Actions for All sorts of famous Attention Layers
Lemonade SDK Adds Nvidia CUDA Support
🧠
LLM Inference
i-programmer.info
·
2d
2 days ago
·
Covers:
Show HN: Lemonade: Run LLMs Locally with GPU and NPU Acceleration
Actions for Lemonade SDK Adds Nvidia CUDA Support
Modular: Day Zero: MiniMax M3 Open Weights on Modular Cloud
🔧
MLOps
Content type:
Blog
modular.com
·
5d
5 days ago
·
Covers:
MiniMax M3: Frontier Coding, 1M Context, Native Multimodality — All in One Model
,
Coding & Agentic Frontier, 1M Context, Multimodal
+1 more
Actions for Modular: Day Zero: MiniMax M3 Open Weights on Modular Cloud
Parallelize speculative
decoding
with P-EAGLE on Amazon SageMaker AI
🧠
LLM Inference
Content type:
Blog
aws.amazon.com
·
1d
1 day ago
Actions for Parallelize speculative decoding with P-EAGLE on Amazon SageMaker AI
Please Use My Free Software
🗄️
Storage Engines
Content type:
Blog
artlu.bearblog.dev
·
3d
3 days ago
Actions for Please Use My Free Software
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Dislike
Report