Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
💾 Prompt Caching
Specific
Context Reuse, KV Cache, Inference Optimization, Token Efficiency
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
179
posts in
29.9
ms
Agentically
optimizing
LLM
prompt
cache
TTLs for fun and profit
⚙️
Mechanical Sympathy
blog.firetiger.com
·
2d
·
Hacker News
InferenceBench
: A Benchmark for Open-Ended Inference
Optimization
by AI Agents
🏗️
LLM Infrastructure
inferencebench.ai
·
4h
·
Hacker News
The
Missing
Engineering Stack for Production AI Agents
💻
Coding Agents
indiehackers.com
·
3d
KV
Cache
and Flash Attention with interactive diagrams
💨
Cache-Friendly Algorithms
kvcache.cobanov.dev
·
9h
·
Hacker News
Block-Based
Double Decoders
🎯
Vector Quantization
arxiv.org
·
1d
LLM
Inference
🧠
LLM Inference
iop.systems
·
1h
How LLM
Inference
Works
🧠
LLM Inference
arpitbhayani.me
·
6d
·
Hacker News
Understanding
KV
Cache
: The Hidden
Memory
Cost of Serving LLMs
🧠
LLM Inference
melchi.me
·
1d
·
Hacker News
DeepSeek Agent Harness: Technical deep-dive & the open-source blueprint
🔧
Agent Tooling
dlcmh.github.io
·
2h
·
Hacker News
KV
Cache
Is Becoming the
Memory
Hierarchy of Inference
🧠
LLM Inference
touchdown-labs.com
·
2d
RedToasty/llama.cpp_qts: Fixing --
split-mode
tensor, with different
KV
cache
quantization types.
🏗️
LLM Infrastructure
github.com
·
3d
·
r/LocalLLaMA
HF downloader utility tampermonkey
⛰
Alpine.js
greasyfork.org
·
2d
·
r/LocalLLaMA
Recent Developments in LLM Architectures:
KV
Sharing, mHC, and Compressed Attention
🧠
LLM Inference
magazine.sebastianraschka.com
·
4d
·
Hacker News
,
Hacker News
,
Hacker News
,
r/LocalLLaMA
KV
Cache
Internals: How Transformers Avoid Recomputing Attention
⚡
Prefetching
pub.towardsai.net
·
1d
Building with the Claude API
🔌
Claude Plugins
anthropic.skilljar.com
·
6d
VeriCache: Turning Lossy
KV
Cache
into Lossless LLM
Inference
🧠
LLM Inference
arxiv.org
·
2d
atomicstrata/atomicmemory-sdk: Open-source AtomicMemory TypeScript SDK
📘
Typescript
github.com
·
5d
·
Hacker News
Not All
Tokens
Are Worth
Caching
: Learning Semantic-Aware Eviction for LLM
Prefix
Caches
🔤
Tokenization
arxiv.org
·
1d
DrBearJew/llama.cpp at tbq4-rdna3-experiment
🏗️
LLM Infrastructure
github.com
·
6d
·
r/LocalLLaMA
TriAxialKV: Toward Extreme Low-Precision
KV-Cache
Quantization for Agentic
Inference
Tasks
🧠
LLM Inference
arxiv.org
·
2d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help