⚡ vLLM - mgjain · Scour

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

🧠KV Cache News

·

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

🧠KV Cache News

digg.com··Hacker News

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

🧠KV Cache Code

github.com··Hacker News

Breaking the Ice: Analyzing Cold Start Latency in vLLM

⚡LLM Inference Academic

arxiv.org··Hacker News

[AINews] not much happened today

⚡LLM Inference News

Show HN: Zerostack, an open coding agent optimized for memory footprint

gi-dellav.github.io··Hacker News

fix(gateway): fail closed for unknown model auth · openclaw/openclaw@85343ea

⚡LLM Inference Code

Using local LLMs for agentic coding

🧠KV Cache Blog

blog.alexewerlof.com·

Five labs, five minds: building a multi-model finance drama on small models

⚡LLM Inference Blog

huggingface.co·

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

🧠KV Cache Code

github.com··Hacker News

A drop-in replacement chat template for google/gemma-4-31B-it tuned for open-source agentic coding harnesses.

⚡LLM Inference

gist.github.com··r/LocalLLaMA

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

🧠KV Cache Academic

mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model vLLM and sglang backends with zero external dependencies

🧠KV Cache Code

github.com··Hacker News

Google Gemma4 12B released

⚡LLM Inference Blog

not much happened today | AINews

google/gemma-4-12B-it-qat-q4_0-gguf

huggingface.co·

Does anyone know what PCIe mode was used for these benchmarks?

⚡LLM Inference Code

github.com··r/LocalLLaMA

YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition

🧠KV Cache Academic

Introducing Granite Libraries and Project Granite Switch

🧠KV Cache Blog

research.ibm.com··Hacker News

DiffusionGemma: The Developer Guide

🧠KV Cache Blog

developers.googleblog.com·

Sign up or log in to see more results

Log in to enable infinite scrolling