Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
KV Cache
⚡ KV Cache
Specific
KV cache, key-value cache, attention cache, LLM inference cache
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
181
posts in
72.3
ms
detects when ML research consensus is shifting using Bayesian CUSUM
🔢
Vector DBs
tattvaai.org
·
6d
6 days ago
·
Hacker News
Actions for detects when ML research consensus is shifting using Bayesian CUSUM
LLM
Inference
Guide: Temperature,
KV
Cache & Speed
🧠
LLM Inference
Content type:
Blog
medium.com
·
4d
4 days ago
Actions for LLM Inference Guide: Temperature, KV Cache & Speed
Less-relevant results
Run a local coding model with pi and LM Studio
🧠
LLM Inference
zarar.dev
·
1d
1 day ago
·
Covers:
Pi.dev: There are many coding agents, but this one is mine
,
Opencode – open-source alternative to Claude Code
+3 more
Actions for Run a local coding model with pi and LM Studio
Sors: a Rust proxy that reorders prompts to maximize
vLLM
prefix
cache
hits
🧠
LLM Inference
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for Sors: a Rust proxy that reorders prompts to maximize vLLM prefix cache hits
DiffusionGemma: Discrete diffusion in a large language model
🧠
LLM Inference
idlemachines.co.uk
·
6d
6 days ago
·
Hacker News
Actions for DiffusionGemma: Discrete diffusion in a large language model
Most people use Ollama or llama.cpp for local LLMs, but these are the tools I switch to when it gets serious
🧠
LLM Inference
xda-developers.com
·
4d
4 days ago
·
Covers:
vllm-project/vllm
,
sgl-project/sglang
+2 more
Actions for Most people use Ollama or llama.cpp for local LLMs, but these are the tools I switch to when it gets serious
GLM-5.2: Built for Long-Horizon Tasks
🧠
LLM Inference
Content type:
Blog
huggingface.co
·
1d
1 day ago
·
Hacker News
,
r/LocalLLaMA
·
Cited by 1 article
·
Covers:
New model GLM-Experimental is quite good (not local so far)
,
GLM Coding Plan for Claude Code
Actions for GLM-5.2: Built for Long-Horizon Tasks
vLLM
Internalised: The Mechanics of Modern
LLM
Inference
🧠
LLM Inference
Content type:
Blog
medium.com
·
4d
4 days ago
Actions for vLLM Internalised: The Mechanics of Modern LLM Inference
Context
compression finally works in production: new research cuts
LLM
input 16x without the accuracy hit
🧠
LLM Inference
venturebeat.com
·
6d
6 days ago
·
r/LocalLLaMA
Actions for Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit
AnchorKV: Safety-Aware
KV
Cache
Compression via Soft Penalty with a Refusal Anchor
🧠
LLM Inference
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor
zai-org/GLM-5.2 is here!
🧠
LLM Inference
9
articles covering this post
huggingface.co
·
1d
1 day ago
·
Hacker News
,
Hacker News
,
r/LocalLLaMA
·
Cited by 9 articles
·
Covers 7 stories
Actions for zai-org/GLM-5.2 is here!
Friday Five — June 12, 2026
🧠
LLM Inference
redhat.com
·
6d
6 days ago
Actions for Friday Five — June 12, 2026
Running local LLMs on the Arduino® UNO™ Q board: a practical guide
💬
LLMs
Content type:
Blog
blog.arduino.cc
·
3h
3 hours ago
Actions for Running local LLMs on the Arduino® UNO™ Q board: a practical guide
China’s DeepSeek reportedly raises $7.4B in funding at $50B+
valuation
🤖
AI Agents
siliconangle.com
·
1d
1 day ago
·
Covers:
Microsoft weighs DeepSeek for Copilot Cowork
Actions for China’s DeepSeek reportedly raises $7.4B in funding at $50B+ valuation
Why Transformer Models Get Costlier as
Context
Grows
💬
LLMs
siliconopera.com
·
6d
6 days ago
Actions for Why Transformer Models Get Costlier as Context Grows
New comment by Greenpants in "Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"
💬
LLMs
Content type:
Discussion
news.ycombinator.com
·
2d
2 days ago
·
Hacker News
·
Cited by 1 article
·
Covers:
I Improved 15 LLMs at Coding in One Afternoon. Only the Harness Changed.
Actions for New comment by Greenpants in "Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"
How Public AI delivers sovereign
LLM
inference
on AWS and Intel
🧠
LLM Inference
Content type:
Blog
aws.amazon.com
·
3d
3 days ago
·
Covers:
Hugging Face – Fun chat with your own Artificial Intelligence
,
vLLM
+1 more
Actions for How Public AI delivers sovereign LLM inference on AWS and Intel
Cosmicgpt – A GPT-in-space simulator to research SpaceX AI satellite viability
💬
LLMs
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for Cosmicgpt – A GPT-in-space simulator to research SpaceX AI satellite viability
ReMP: Low-Downtime Runtime Model-Parallelism Reconfiguration for
LLM
Serving
🌐
Distributed Systems
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for ReMP: Low-Downtime Runtime Model-Parallelism Reconfiguration for LLM Serving
Free
LLM
APIs Compared: Rate Limits, Models, and Real Costs (2026)
📄
ML Papers
Content type:
Blog
Content type:
Discussion
openrouter.ai
·
2d
2 days ago
·
Covers 6 stories
Actions for Free LLM APIs Compared: Rate Limits, Models, and Real Costs (2026)
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Dislike
Report