✨ Model optimizations in LLMs - pleto · Scour

The Inference Alpha: Maximizing Frontier Models on AMD

🔧Systems-level optimizations for LLM serving Blog

digitalocean.com·

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell

🧠Large Language Models (LLMs) News Blog

developer.nvidia.com·

Shrinking a Neural Network Often Makes It Smarter

🧠Large Language Models (LLMs)

siliconopera.com·

Making Local LLM Fast

🧠Large Language Models (LLMs)

bogdan.nimblex.net··Hacker News

Google releases Gemma 4 QAT models for local AI on enterprise laptops

📊AI Performance Profiling

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

🧠Large Language Models (LLMs) Blog

bric.pe.kr··DEV

Trainable Smooth-Rotation Transforms with Learned Channel Scales for LLM Quantization

🔢Quantization of LLMs Academic

DiffusionGemma 26B A4B results on my 5090

🧠Large Language Models (LLMs)

huggingface.co··r/LocalLLaMA

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

🔧Systems-level optimizations for LLM serving Blog

mimo.xiaomi.com··Hacker News, r/LocalLLaMA

iOS Security SDKs & Audits for Production Teams

🔧Systems-level optimizations for LLM serving Discussion

sentinelden.com··Hacker News

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

🔧Systems-level optimizations for LLM serving Code

github.com··Hacker News

On-device AI is a margin decision

🧠Large Language Models (LLMs) Blog

ziraph.com··Hacker News

6. Air-Gapped Claude Code - The Claude Code SRE Handbook

🚀LLM serving frameworks

har-ki.github.io··Hacker News

Week Links [1st June 2026]

🔢Quantization of LLMs

jackharrington.xyz·

Local AI has a hardware accessibility problem, and the answer to it isn't RTX Spark

🧠Large Language Models (LLMs)

xda-developers.com·

TFLite Edge Model Quantizer Snippet

🔢Quantization of LLMs

itsevilduck.gumroad.com··DEV

Less-relevant results

The key steps that will enable organizations to scale Physical AI

⚙️AI Infrastructure Automation

·

The week AI infrastructure crossed from a technology story to a financial one

🧠Large Language Models (LLMs) News

Your robot can’t be smart, fast, and free. Evolution solved that already.

⚡Real-time AI Systems News

thenextweb.com·

OpenAI’s IPO Math: $25B Revenue, $27B Burn Rate

🧠Large Language Models (LLMs) Blog Discussion

Sign up or log in to see more results

Log in to enable infinite scrolling