🧠 LLMs - dmndxld

🔧MLOps Discussion

news.ycombinator.com··Hacker News

Google open-sources speedy DiffusionGemma text diffusion model

💻GPU Computing

siliconangle.com·

Expanding Apple Foundation Models to support image inputs and running in private cloud compute is a huge upgrade. The old models really weren’t capable...

🔧MLOps

manton.org·

Google Gemma4 12B released

🔥PyTorch Blog

medium.com·

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

💻GPU Computing

everylocalai.com··DEV

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

🔧MLOps

zozo123.github.io··Hacker News

How we fight GPU scarcity without compromise

💻GPU Computing Blog

equixly.com··Hacker News

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

🏗️AI Infra

phoronix.com··r/artificial

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

🔧MLOps Blog

adambien.blog·

A Plea to the Labs: Let the Models Diagnose.

🔧MLOps Blog

tangent.bearblog.dev··Hacker News

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

💻GPU Computing News Blog

blog.google··Hacker News

How Gemma Collins’ dad saved her from financial ruin & helped rake in £1.4M last year… he even lives with her & fiancé

🔥PyTorch News

thesun.co.uk

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

⚡Distributed Training Academic

arxiv.org·

Report: GKE Inference Gateway delivers up to 92% faster AI responses

🏗️AI Infra Blog

cloud.google.com··Hacker News

How J.A.R.V.I.S. Became the Smartest Mind on Earth — What is an LLM?

🏗️AI Infra Blog

medium.com·

Humans and LLMs share a mental disorder: Fugue Lock

🐧Operating Systems

vwwwv.org··Hacker News

Gemma 4 31B Runs Fastest on SambaCloud

🌐Networking

sambanova.ai·

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

🐍Python

huggingface.co··r/LocalLLaMA

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🐍Python Code

github.com··Hacker News

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI?

Google open-sources speedy DiffusionGemma text diffusion model

Expanding Apple Foundation Models to support image inputs and running in private cloud compute is a huge upgrade. The old models really weren’t capable...

Google Gemma4 12B released

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

How we fight GPU scarcity without compromise

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

A Plea to the Labs: Let the Models Diagnose.

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

How Gemma Collins’ dad saved her from financial ruin & helped rake in £1.4M last year… he even lives with her & fiancé

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Report: GKE Inference Gateway delivers up to 92% faster AI responses

How J.A.R.V.I.S. Became the Smartest Mind on Earth — What is an LLM?

Humans and LLMs share a mental disorder: Fugue Lock

Gemma 4 31B Runs Fastest on SambaCloud

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes