🚀 Speculative Decoding - nayyara.airlangga

mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model vLLM and sglang backends with zero external dependencies

💾KV Cache Code

github.com··Hacker News

a local Windows app for interview prep and mock practice

🚀Model Serving

ofarwise.com··Hacker News

Qwen 3.6 27B AutoRound GGUF, need your feedback

💰Inference Cost

huggingface.co··r/LocalLLaMA

Build a Medical Report Analyzer on Dedicated Inference with Python

🧠Inference Engineering

digitalocean.com·

Castlevania: Belmont's Curse release date confirmed on October 15, Japanese voice cast list also revealed

🧵Warp Scheduling

rpgsite.net·

Review: The Boy with the Light-Blue Eyes - SXSW London 2026

⚡Triton

cineuropa.org·

B & S About Movies podcast Episode 140: The Sons of Hercules

🔢GEMM Optimization

bandsaboutmovies.com·

New rumour claims with '100%' confidence that AMD's next-gen Zen 6 desktop CPU will run at over 6.5 GHz

🧠HBM Bandwidth News

pcgamer.com

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

💾KV Cache Code

github.com··r/LocalLLaMA

the sissy boy

🚀Model Serving Blog

blog.hyeonje.website·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

💰Inference Cost News Blog

blog.google··Hacker News

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

🧠Inference Engineering Academic

arxiv.org·

Machinic Psychopharmacology: Do LLMs Self-Medicate?

💾KV Cache

lesswrong.com··Hacker News

Barbara Gladstone Living Room

🧠HBM Bandwidth

greg.org·

Google's new open model DiffusionGemma generates text from noise instead of word by word

K-Forcing: Joint Next-K-Token Decoding via Push-Forward Language Modeling

Jason McDonald

Youssof Altoukhi (@Youssofal_)

Amy Adams Brings Dario Vitale’s Versace Style to ‘The Tonight Show’

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model vLLM and sglang backends with zero external dependencies

a local Windows app for interview prep and mock practice

Qwen 3.6 27B AutoRound GGUF, need your feedback

Build a Medical Report Analyzer on Dedicated Inference with Python

Castlevania: Belmont's Curse release date confirmed on October 15, Japanese voice cast list also revealed

Review: The Boy with the Light-Blue Eyes - SXSW London 2026

B & S About Movies podcast Episode 140: The Sons of Hercules

New rumour claims with '100%' confidence that AMD's next-gen Zen 6 desktop CPU will run at over 6.5 GHz

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

the sissy boy

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

Machinic Psychopharmacology: Do LLMs Self-Medicate?

Barbara Gladstone Living Room