Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Context Windows
🪟 Context Windows
Specific
Long Context Models, Memory Management, Attention Patterns
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
120
posts in
5.7
ms
Claude Fable 5 🚀, Gemini 3.5 Live Translate 📱,
scaling
test time compute 📈
🤖
Agent
tldr.tech
·
1d
1 day ago
Actions for Claude Fable 5 🚀, Gemini 3.5 Live Translate 📱, scaling test time compute 📈
Two Leaps to 1000 Tokens/s on a 1T-Parameter
Model
: On Inference Systems, Execution Boundaries, and
Co-Design
⚡
Inference Optimization
Content type:
Blog
tilert.ai
·
2d
2 days ago
·
Hacker News
Actions for Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design
huawei-csl/KVarN: KVarN is a native vLLM
KV-cache
quantization backend for your agents: 3-5x more
context
, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
⚡
Inference Optimization
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure
⚓
Kubernetes
Content type:
Blog
jimmysong.io
·
2d
2 days ago
Actions for From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure
A system programmer’s guide to
LLM
inference
🤖
LLM
Content type:
Blog
blog.xiangpeng.systems
·
3d
3 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
FOD#155: Continual Learning in LLMs: Why AI
Models
Need Sleep
🤖
Agent
turingpost.com
·
2d
2 days ago
Actions for FOD#155: Continual Learning in LLMs: Why AI Models Need Sleep
Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language
Models
✨
Gemini
Content type:
Academic
arxiv.org
·
3h
3 hours ago
Actions for Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models
MLPerf and the rise of latency-aware
LLM
benchmarking
🦙
Llama
edn.com
·
5d
5 days ago
Actions for MLPerf and the rise of latency-aware LLM benchmarking
The
Memory
Problem is Solved: How Google’s
Memory
Caching
Makes RNNs Smart Again
🤖
Transformers
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for The Memory Problem is Solved: How Google’s Memory Caching Makes RNNs Smart Again
See, Act, Correct: three levers for working with a code agent
🎮
Reinforcement Learning
Content type:
Blog
blog.owulveryck.info
·
6d
6 days ago
·
Hacker News
,
Hacker News
Actions for See, Act, Correct: three levers for working with a code agent
DeepSeek V4, LeCun's Bet Against LLMs, and Lovable's Self-Improving Agent - The Tokenizer Edition #30
🤖
AI
newsletter.artofsaience.com
·
6d
6 days ago
Actions for DeepSeek V4, LeCun's Bet Against LLMs, and Lovable's Self-Improving Agent - The Tokenizer Edition #30
STAR-KV
: Low-Rank
KV
Cache
Compression via Soft Thresholding for Adaptive Rank Control
⚡
Inference Optimization
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control
heterodoxin/graphkv: Graph-guided
KV
cache
compression for
memory-efficient
LLM inference.
🤖
AI
Content type:
Code
github.com
·
4d
4 days ago
·
r/LocalLLaMA
Actions for heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.
OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine
👁️
Computer Vision
linuxiac.com
·
2d
2 days ago
Actions for OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine
The economics of speculative decoding
⚡
Inference Optimization
Content type:
Blog
fergusfinn.com
·
3d
3 days ago
·
Hacker News
Actions for The economics of speculative decoding
Alignment Collapse Under
KV
Cache
Quantization: Diagnosis and Mitigation
⚡
Inference Optimization
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation
Anthropic’s $965B Valuation: What $47B Revenue Says
🎭
Anthropic Claude
Content type:
Blog
Content type:
Discussion
tildalice.io
·
6d
6 days ago
Actions for Anthropic’s $965B Valuation: What $47B Revenue Says
Where to Host Your Open-Source
Model
(Under 10B Parameters)
⚡
Inference Optimization
digitalocean.com
·
6d
6 days ago
Actions for Where to Host Your Open-Source Model (Under 10B Parameters)
End-to-End
Context
Compression at
Scale
🤖
Transformers
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for End-to-End Context Compression at Scale
harshuljain13/llm-inference-at-scale
: A Practitioner handbook for production
llm
serving.
🤖
AI
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help