⚡ KV Cache - linbolin1230 · Scour

detects when ML research consensus is shifting using Bayesian CUSUM

tattvaai.org··Hacker News

LLM Inference Guide: Temperature, KV Cache & Speed

🧠LLM Inference Blog

·

Less-relevant results

Run a local coding model with pi and LM Studio

🧠LLM Inference

zarar.dev··Covers: Pi.dev: There are many coding agents, but this one is mine, Opencode – open-source alternative to Claude Code +3 more

Sors: a Rust proxy that reorders prompts to maximize vLLM prefix cache hits

🧠LLM Inference Code

github.com··Hacker News

DiffusionGemma: Discrete diffusion in a large language model

🧠LLM Inference

idlemachines.co.uk··Hacker News

Most people use Ollama or llama.cpp for local LLMs, but these are the tools I switch to when it gets serious

🧠LLM Inference

xda-developers.com··Covers: vllm-project/vllm, sgl-project/sglang +2 more

GLM-5.2: Built for Long-Horizon Tasks

🧠LLM Inference Blog

huggingface.co··Hacker News, r/LocalLLaMA·Cited by 1 article·Covers: New model GLM-Experimental is quite good (not local so far), GLM Coding Plan for Claude Code

vLLM Internalised: The Mechanics of Modern LLM Inference

🧠LLM Inference Blog

·

Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit

🧠LLM Inference

venturebeat.com··r/LocalLLaMA

AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor

🧠LLM Inference Academic

zai-org/GLM-5.2 is here!

🧠LLM Inference 9

huggingface.co··Hacker News, Hacker News, r/LocalLLaMA·Cited by 9 articles·Covers 7 stories

Friday Five — June 12, 2026

🧠LLM Inference

Running local LLMs on the Arduino® UNO™ Q board: a practical guide

💬LLMs Blog

blog.arduino.cc·

China’s DeepSeek reportedly raises $7.4B in funding at $50B+ valuation

siliconangle.com··Covers: Microsoft weighs DeepSeek for Copilot Cowork

Why Transformer Models Get Costlier as Context Grows

siliconopera.com·

New comment by Greenpants in "Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"

💬LLMs Discussion

news.ycombinator.com··Hacker News·Cited by 1 article·Covers: I Improved 15 LLMs at Coding in One Afternoon. Only the Harness Changed.

How Public AI delivers sovereign LLM inference on AWS and Intel

🧠LLM Inference Blog

aws.amazon.com··Covers: Hugging Face – Fun chat with your own Artificial Intelligence, vLLM +1 more

Cosmicgpt – A GPT-in-space simulator to research SpaceX AI satellite viability

💬LLMs Code

github.com··Hacker News

ReMP: Low-Downtime Runtime Model-Parallelism Reconfiguration for LLM Serving

🌐Distributed Systems Academic

Free LLM APIs Compared: Rate Limits, Models, and Real Costs (2026)

📄ML Papers Blog Discussion

openrouter.ai··Covers 6 stories

Log in to enable infinite scrolling