Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🗄️ KV Cache
Specific
key-value cache, attention cache, LLM inference, paged attention
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
184119
posts in
16.2
ms
DepthKV
: Layer-Dependent
KV
Cache Pruning for Long-Context LLM Inference
🧠
LLMs
arxiv.org
·
2d
Gemma 4 and Qwen 3.6 with
q8
_0 and q4_0 KV cache: KL
divergence
results
💾
Storage Engines
localbench.substack.com
·
6d
·
r/LocalLLaMA
AmSach/kvquant
: Drop-in KV cache compressor for local LLM inference - Run 70B models on 8GB RAM
🧠
LLMs
github.com
·
5h
·
DEV
Qwen 3.6-35B-A3B KV cache bench:
f16
vs q8_0 vs
turbo3
vs turbo4 from 0 to 1M context on M5 Max
💾
Storage Engines
llmkube.com
·
2d
·
r/LocalLLaMA
Intel
prioritizes
Xeon
; CPU shortage opens door for AMD and MediaTek
🖥️
Systems Programming
digitimes.com
·
3h
DeepSeek V4 Cuts KV Cache by 90% at 1M Tokens, But Aggressive Compression Could Risk ‘
Needle
in a
Haystack
’ Failures
🧠
Reasoning Models
wccftech.com
·
5d
Speculative
Decoding vs
MoE
: 3.2x Cost Gap on Llama 3
🧠
LLMs
tildalice.io
·
2d
Legare
Kerrison
and Cedric Clyburn on LLM Performance and Evaluations
📊
LLM Evaluation
infoq.com
·
2d
Skymizer
Taiwan Inc. Unveils Breakthrough Architecture
Enabling
Ultra-Large LLM Inference on a Single Card
🧠
Reasoning Models
en.prnasia.com
·
3d
·
r/LocalLLaMA
not much
happened
today
🌊
Streaming Algorithms
news.smol.ai
·
2d
Rethinking KV Cache
Eviction
via a Unified
Information-Theoretic
Objective
🧮
Cache-Oblivious Algorithms
arxiv.org
·
12h
What 2x
GH200
delivers: memory
paths
for LLM inference
💾
Storage Engines
dnhkng.github.io
·
5d
Human Memory Management And
Obsidian
V2
🦎
Zig Allocators
grahamhelton.com
·
2d
Lumai
Launches the World’s First Optical Computing System for Real-Time,
Billion-Parameter
LLM Inference
🧠
Reasoning Models
globenewswire.com
·
2d
New comment by
zigzag312
in "Rust Memory Management: Ownership vs. Reference
Counting
"
🖥️
Systems Programming
news.ycombinator.com
·
3d
·
Hacker News
When Hidden States Drift: Can
KV
Caches
Rescue Long-Range Speculative Decoding?
🧮
Cache-Oblivious Algorithms
arxiv.org
·
12h
I got a $134
Cloudflare
D1
bill. Here's how I cut it 95%
💾
Storage Engines
fullstacksveltekit.com
·
3d
·
Hacker News
libtalloc
_
debugging
(3) Linux Manual Page
🖥️
Systems Programming
systutorials.com
·
5d
DAK
: Direct-Access-Enabled GPU Memory
Offloading
with Optimal Efficiency for LLM Inference
🧠
Reasoning Models
arxiv.org
·
12h
SecureDrop
2.15.1 released
🦀
Rust
securedrop.org
·
6d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help