Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
💾 Prompt Caching
Specific
Context Reuse, KV Cache, Inference Optimization, Token Efficiency
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
28848
posts in
70.9
ms
Prompt
caching
but for RL – 7.5x
speedup
on long-prompt/short-response workloads
🧠
LLM Inference
castform.com
·
23h
·
Hacker News
Understanding
KV
Cache in LLMs and How It
Affects
Inference
🧠
LLM Inference
pub.towardsai.net
·
4d
Reformulating
KV Cache
Eviction
Problem for Long-Context LLM Inference
🧠
LLM Inference
arxiv.org
·
1d
Structural Prompt
Preservation
: Keeping AI Agents on Track Across Long
Sessions
🧠
Agent Memory
leithdocs.com
·
6h
Compressing
KV
caches
with a related model
🔬
RaBitQ
fergusfinn.com
·
4d
·
Hacker News
Codex
in Chrome 🤖, inside Chinese
labs
🇨🇳, improving token efficiency 🛠️
🇨🇳
Chinese AI
tldr.tech
·
4d
fluffypony/dothething
: an autonomous AI agent: you describe the thing, it does the thing.
🔧
Agent Tooling
github.com
·
3d
LKV
: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache
Eviction
🧠
LLM Inference
arxiv.org
·
1d
Improving
token
efficiency in GitHub Agentic
Workflows
💰
Tokenomics
github.blog
·
4d
Pinning
a Local LLM to an RTX 5090: Five Hours, Several
Faceplants
, One Solid Setup
🏗️
LLM Infrastructure
buraak.com
·
6d
·
Hacker News
RDKV
: Rate-Distortion Bit Allocation for Joint
Eviction
and Quantization of the KV Cache
🔬
RaBitQ
arxiv.org
·
16h
RateQuant
: Optimal Mixed-Precision KV Cache Quantization via
Rate-Distortion
Theory
🔬
RaBitQ
arxiv.org
·
1d
Sparse
Attention as a Range Searching Problem: Towards an Inference-Efficient Index for
KV
Cache
🧠
LLM Inference
arxiv.org
·
1d
Is 3-Bit KV Cache the Holy
Grail
? A Reality Check on Google’s
TurboQuant
🔬
RaBitQ
pub.towardsai.net
·
3d
How to
Compress
KV
Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment
🗜️
Vector Compression
arxiv.org
·
1d
A
Queueing-Theoretic
Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints
🧠
LLM Inference
arxiv.org
·
5d
Tutti
: Making SSD-Backed
KV
Cache Practical for Long-Context LLM Serving
🏗️
LLM Infrastructure
arxiv.org
·
6d
AdapShot
: Adaptive Many-Shot In-Context Learning with Semantic-Aware KV Cache
Reuse
📦
Batch Embeddings
arxiv.org
·
6d
RetentiveKV
: State-Space Memory for Uncertainty-Aware Multimodal KV Cache
Eviction
💨
Cache-Friendly Algorithms
arxiv.org
·
5d
Memory
Inception
: Latent-Space
KV
Cache Manipulation for Steering LLMs
🧠
LLM Inference
arxiv.org
·
4d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help