KV Cache

Feeds to Scour
SubscribedAll
Scoured 83 posts in 9.4 ms

OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.

 🧠Inference Engineering  Content type: Blog

Enabling KV Caching of Shared Prefix for Diffusion Language Models

 🔄Cache-Coherence  Content type: Academic
arxiv.org·

fix(gateway): fail closed for unknown model auth · openclaw/openclaw@85343ea

 🚀Speculative Decoding  Content type: Code
github.com·

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

 🧠Inference Engineering  Content type: News
digg.com··Hacker News

not much happened today | AINews

 🧠Inference Engineering
news.smol.ai·

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

 ⏱️Prefill Decoding  Content type: Code
github.com··Hacker News

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

 🧠Inference Engineering  Content type: News
latent.space
·

Using local LLMs for agentic coding

 💰Inference Cost  Content type: Blog
blog.alexewerlof.com·

Build a local voice agent with Red Hat OpenShift AI

 🎮GPU Computing
developers.redhat.com·

Introducing Granite Libraries and Project Granite Switch

 🧠Inference Engineering  Content type: Blog

Breaking free of a single datacenter: Practical geo-distributed AI operations with the k0smos platforms

 ☁️Cloud Infrastructure  Content type: Blog
cncf.io·

[eCHO News] Episode #104: mTLS for Cilium. Lisp for eBPF

 🐝eBPF

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

 🎮GPU Computing  Content type: Academic
arxiv.org·

mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model vLLM and sglang backends with zero external dependencies

 🧠Inference Engineering  Content type: Code
github.com··Hacker News

The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure

 🛡️SRE
devops.com·

Does anyone know what PCIe mode was used for these benchmarks?

 🧠Inference Engineering  Content type: Code
github.com··r/LocalLLaMA

YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition

 🧠Inference Engineering  Content type: Academic
arxiv.org·

#068 - Apple runs Siri on Google's Gemini, OpenAI files a secret IPO at $852B, Xiaomi clocks 1,000 tps

 💰Inference Cost
indiehacker.news·

TjWheeler/deep-memory: A GraphRAG implementation with a Vocabulary system to optimise AI integration

 🧠Inference Engineering  Content type: Code
github.com··Hacker News

google/gemma-4-12B-it-qat-q4_0-gguf

 🧠Inference Engineering
huggingface.co·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help