🚀 LLM serving frameworks - pleto · Scour

DiffusionGemma: The Developer Guide- Google Developers Blog

🔧Systems-level optimizations for LLM serving Blog

developers.googleblog.com··r/LocalLLaMA

Unsloth Gemma 4 QAT

✨Model optimizations in LLMs

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

🧠Large Language Models (LLMs)

deemwar-products.github.io··Hacker News

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

📊AI Performance Profiling

everylocalai.com··DEV

AI Serving Platform That Adapts to Your Model

📊AI Performance Profiling Blog

databricks.com·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

✨Model optimizations in LLMs News Blog

blog.google··Hacker News

Less-relevant results

Google's new open-weights model brings image-generation tricks to AI text generation

🧠Large Language Models (LLMs) News

theregister.com·

DiffusionGemma 26B A4B results on my 5090

🧠Large Language Models (LLMs)

huggingface.co··r/LocalLLaMA

Fixing a stuck Ollama runner and building a GPU watchdog

📊AI Performance Profiling

patrickmccanna.net··Hacker News

What Ollama Reveals About Local AI, Agents, and Open Models

🤖Agents using LLMs Blog

odsc.medium.com·

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

📊AI Performance Profiling

local-llm.utop.workers.dev··Hacker News

What's in the Box? A Field Guide to AI Models

🧠Large Language Models (LLMs) Blog

iankduncan.com·

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

🧠Large Language Models (LLMs) Blog

ziraph.com··Hacker News

fix(ollama): use provider thinking default in SDK session factory (#9… · openclaw/openclaw@4f3c2cd

🤖Agents using LLMs Code

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

🔧Systems-level optimizations for LLM serving Blog

dnhkng.github.io·

On-device AI is a margin decision

🧠Large Language Models (LLMs) Blog

ziraph.com··Hacker News

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

🔢Quantization of LLMs

vettedconsumer.com··Hacker News

An LLM that reviews your code, challenges your decisions, but never writes code for you

💬Prompt optimizations for LLM serving Blog

blog.adafruit.com·

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

⚙️AI Infrastructure Automation

uccl-project.github.io··Hacker News

How we fight GPU scarcity without compromise

🧠Large Language Models (LLMs) Blog

equixly.com··Hacker News

Log in to enable infinite scrolling