KV Cache

Feeds to Scour
SubscribedAll
Scoured 22 posts in 24.1 ms

huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

 Inference  Content type: Code
github.com··Hacker News
Less-relevant results

Report: GKE Inference Gateway delivers up to 92% faster AI responses

 ☁️GCP  Content type: Blog

How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026)

 🖥️Local AI  Content type: Blog
dev.to··DEV

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

 🔓Open Source AI  Content type: News  Content type: Blog
blog.google··Hacker News

KVarN, Cost.dev, headroom — the week the agent runtime bill got itemized

 🤖AI Inference  Content type: Blog
dev.to··DEV

DeepSeek V4, LeCun's Bet Against LLMs, and Lovable's Self-Improving Agent - The Tokenizer Edition #30

 🤖AI

FOD#155: Continual Learning in LLMs: Why AI Models Need Sleep

 🤖Large Language Models
turingpost.com·

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

 Quantization  Content type: Blog
dev.to··DEV

shoo99/paper-rag: A private, fully-local RAG over your own PDFs: BGE-M3 + embedded Qdrant + a local LLM via Ollama. ~150 lines, nothing leaves your machine.

 🤖Large Language Models  Content type: Code
github.com··DEV

Why most LLM VRAM calculators are wrong on modern models (and an open-source MIT fix)

 🔓Open Source AI  Content type: Blog
dev.to··DEV

How to Tune --n-gpu-layers for Your VRAM Budget

 📊Compute Markets  Content type: Blog
dev.to··DEV

LLM Research Papers: The 2026 List (January to May)

 🤖AI  Content type: News

Stateful Swarms: How Persistent Memory Beats Traditional Agent Architectures

 📊Compute Markets  Content type: News

Why Self-Hosted Claude Code Was 15 Slower Than It Should Be

 🧠LLMs  Content type: Blog
dev.to··DEV

I Benchmarked 3 Local LLMs on My Laptop — Here's What the Numbers Actually Show

 🧠LLM  Content type: Blog
dev.to··DEV

The Hidden Contract of Mastery: Why Complexity Is Yours to Absorb

 📊ML Research  Content type: Blog
dev.to··DEV

Hello, dev.to — I'm Victor: 10+ years full-stack, two CTO runs, now solo and writing in the open

 🤖Large Language Models  Content type: Blog
dev.to··DEV

How Automating RAG Memory Tests with ChromaDB Quadrupled Our Bug Discovery Rate

 🧠LLM Reasoning  Content type: Blog
dev.to··DEV

Your Copilot Just Got a Local Brain

 🤖AI  Content type: Blog
dev.to··DEV

Headroom: Cut Your LLM Token Usage by Up to 95% Without Changing Your Answers

 🔧MCP  Content type: Blog
dev.to··DEV

No more posts from buckman's subscribed feeds.

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help