Feeds to Scour
SubscribedAll
Scoured 18227 posts in 423.0 ms
HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
arxiv.org·1d
💾Prompt Caching
Preview
Report Post
Gated DeltaNet: The “Surgical Eraser” Solving Linear Attention’s Memory Problem
pub.towardsai.net·1d
📱Edge AI Optimization
Preview
Report Post
meta-pytorch/segment-anything-fast: A batched offline inference oriented version of segment-anything
github.com·45m
📦Batch Embeddings
Preview
Report Post
From 75% to 99.6%: The Math of LLM Ensembles
shibaprasadb.com·1d·
Discuss: Hacker News
🏆LLM Benchmarking
Preview
Report Post
PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator
arxiv.org·1d
🏗️LLM Infrastructure
Preview
Report Post
Co-optimization Approaches For Reliable and Efficient AI Acceleration (Peking University et al.)
semiengineering.com·16h
Hardware Acceleration
Preview
Report Post
ChatGPT’s Laws of Machine Learning
shruggingface.com·1d
🛡️AI Security
Preview
Report Post
Why AI Needs GPUs and TPUs: The Hardware Behind LLMs
blog.bytebytego.com·2d
Hardware Acceleration
Preview
Report Post
The three types of LLM workloads and how to serve them
modal.com·17h·
Discuss: Hacker News
🏗️LLM Infrastructure
Preview
Report Post
MLSN #18: Adversarial Diffusion, Activation Oracles, Weird Generalization
lesswrong.com·1d
🛡️AI Security
Preview
Report Post
MIT’s new ‘recursive’ framework lets LLMs process 10 million tokens without context rot
venturebeat.com·1d·
🏗️LLM Infrastructure
Preview
Report Post
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
machinelearning.apple.com·1d
🕯️Candle ML
Preview
Report Post
Everything Moe
ianbarber.blog·1d·
Discuss: Hacker News
🔤Tokenization
Preview
Report Post
Get To Grips With Transformers And LLMs
i-programmer.info·1d·
🪄Prompt Engineering
Preview
Report Post
Streamlining CUB with a Single-Call API
developer.nvidia.com·12h
🏟️Arena Allocators
Preview
Report Post
Bye Bye Big Tech Step 5: AI assistents and chatbots
bitsoffreedom.nl·21m
🆕New AI
Preview
Report Post
Model-agnostic linear-memory online learning in spiking neural networks
nature.com·2d
🔢BitNet Inference
Preview
Report Post
ANN v3: 200ms p99 query latency over 100 billion vectors
turbopuffer.com·1d·
Discuss: Hacker News
🔮Prefetching
Preview
Report Post
A Visual Guide to Quantization
newsletter.maartengrootendorst.com·2d
🔬RaBitQ
Preview
Report Post
Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
media.mit.edu·14h·
Discuss: Hacker News
👨‍💻AI Coding
Preview
Report Post

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help