🔧 MLOps - jasonvh · Scour

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

🧠LLMs Blog

towardsai.net·

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

vettedconsumer.com··Hacker News

Introducing Granite Libraries and Project Granite Switch

🧠LLMs Blog

research.ibm.com··Hacker News

OpenPCC: Open and Confidential LLM Serving on Commodity TEEs

🧠LLMs Academic

mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model vLLM and sglang backends with zero external dependencies

🧠LLMs Code

github.com··Hacker News

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

🧠LLMs News

digg.com··Hacker News

Google's new open model DiffusionGemma generates text from noise instead of word by word

the-decoder.com

·

Where to Host Your Open-Source Model (Under 10B Parameters)

digitalocean.com·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🧠LLMs News Blog

blog.google··Hacker News

Youssof Altoukhi (@Youssofal_)

xcancel.com··r/LocalLLaMA

Breaking free of a single datacenter: Practical geo-distributed AI operations with the k0smos platforms

🧠LLMs Blog

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

🧠LLMs News Blog

kaitchup.substack.com··r/LocalLLaMA

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

🧠LLMs Code

github.com··r/LocalLLaMA

#068 - Apple runs Siri on Google's Gemini, OpenAI files a secret IPO at $852B, Xiaomi clocks 1,000 tps

indiehacker.news·

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

🧠LLMs Academic

[eCHO News] Episode #104: mTLS for Cilium. Lisp for eBPF

isovalent-9197153.hs-sites.com·

Build a local voice agent with Red Hat OpenShift AI

developers.redhat.com·

Using local LLMs for agentic coding

🧠LLMs Blog

blog.alexewerlof.com·

Build a Medical Report Analyzer on Dedicated Inference with Python

digitalocean.com·

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

🧠LLMs Code

github.com··Hacker News

Sign up or log in to see more results

Log in to enable infinite scrolling