Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🗄️ KV Cache
Specific
key-value cache, attention cache, LLM inference, paged attention
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
184158
posts in
17.5
ms
Google splits AI chips into training and inference
TPUs
, signaling shift toward
workload-specialized
AI infrastructure
🧠
Reasoning Models
digitimes.com
·
6d
CacheFlow
: Efficient LLM Serving with 3D-Parallel
KV
Cache Restoration
🧮
Cache-Oblivious Algorithms
arxiv.org
·
1d
PolyKV
: A Shared
Asymmetrically-Compressed
KV Cache Pool for Multi-Agent LLM Inference
🧠
Reasoning Models
arxiv.org
·
1d
dog-qiuqiu/invincat
: A native Python agent CLI built on DeepAgents CLI, featuring an independent memory Agent that captures learnings after each task and delivers efficient AI coding assistance through hierarchical memory management.
🤖
AI Agents
github.com
·
4d
·
Hacker News
Stochastic
KV
Routing: Enabling Adaptive
Depth-Wise
Cache Sharing
🧮
Cache-Oblivious Algorithms
arxiv.org
·
2d
Salca
: A
Sparsity-Aware
Hardware Accelerator for Efficient Long-Context Attention Decoding
🧠
Reasoning Models
arxiv.org
·
1d
Hardware Generation and Exploration of
Lookup
Table-Based
Accelerators
for 1.58-bit LLM Inference
🧠
Reasoning Models
arxiv.org
·
1d
QFlash
: Bridging
Quantization
and Memory Efficiency in Vision Transformer Attention
🌊
Streaming Algorithms
arxiv.org
·
1d
PMZFX/intel-arc-pro-b70-benchmarks
: Benchmark results and performance data for the Intel Arc Pro B70 GPU (
Xe2/Battlemage
) - LLM inference, video generation, dual-GPU scaling.
🛢️
Database Internals
github.com
·
6d
·
Hacker News
PathRWKV
: Enhancing Whole Slide Image Inference with Asymmetric
Recurrent
Modeling
🌊
Streaming Algorithms
arxiv.org
·
2d
Parameter
Efficiency Is Not Memory Efficiency:
Rethinking
Fine-Tuning for On-Device LLM Adaptation
🧠
LLMs
arxiv.org
·
2d
SparKV
:
Overhead-Aware
KV Cache Loading for Efficient On-Device LLM Inference
🧠
Reasoning Models
arxiv.org
·
6d
NVLLM
: A 3D
NAND-Centric
Architecture Enabling Edge on-Device LLM Inference
🧠
Reasoning Models
arxiv.org
·
1d
Network Edge Inference for Large Language Models:
Principles
,
Techniques
, and Opportunities
🧠
LLMs
arxiv.org
·
2d
FairyFuse
: Multiplication-Free LLM Inference on CPUs via
Fused
Ternary Kernels
🔧
SMT Solvers
arxiv.org
·
6d
Secure
On-Premise
Deployment of Open-Weights Large Language Models in Radiology: An Isolation-First Architecture with
Prospective
Pilot Evaluation
🧠
Reasoning Models
arxiv.org
·
2d
MCAP
: Deployment-Time Layer
Profiling
for Memory-Constrained LLM Inference
🧠
Reasoning Models
arxiv.org
·
6d
DiP-SD: Distributed
Pipelined
Speculative
Decoding for Efficient LLM Inference at the Edge
🧠
Reasoning Models
arxiv.org
·
6d
SwarmDrive
: Semantic
V2V
Coordination for Latency-Constrained Cooperative Autonomous Driving
🤝
Consensus Algorithms
arxiv.org
·
2d
Spatial
Metaphors
for LLM Memory: A Critical Analysis of the
MemPalace
Architecture
🧠
LLMs
arxiv.org
·
6d
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help