🚀 Model Serving - micaleel · Scour

huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

🤖AI Code

github.com··Hacker News

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

huggingface.co··r/LocalLLaMA

DiffusionGemma: The Developer Guide- Google Developers Blog

🤖AI Blog

developers.googleblog.com··r/LocalLLaMA

DiffusionGemma: 4x Faster Text Generation

🤖AI News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

🤖AI News Blog

kaitchup.substack.com··r/LocalLLaMA

NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure

🧠Deep Learning Blog

blogs.nvidia.com··Hacker News

Machinic Psychopharmacology: Do LLMs Self-Medicate?

lesswrong.com··Hacker News

Youssof Altoukhi (@Youssofal_)

xcancel.com··r/LocalLLaMA

A drop-in replacement chat template for google/gemma-4-31B-it tuned for open-source agentic coding harnesses.

🐍Programming

gist.github.com··r/LocalLLaMA

sgl-project/sglang-omni: SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models

🔨LLVM Code

OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.

🤖AI Blog

huggingface.co··Hacker News, r/LocalLLaMA

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

🤖AI Code

github.com··r/LocalLLaMA

Does anyone know what PCIe mode was used for these benchmarks?

🤖AI Code

github.com··r/LocalLLaMA

No more posts from micaleel's subscribed feeds.

Scour all 25257 feeds Learn more about Feeds

Log in to enable infinite scrolling