⚙️ AI Infrastructure - moznotes · Scour

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

🤖AI Code

github.com··Hacker News

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

🧠LLM Engineering

zozo123.github.io··Hacker News

Breaking the Ice: Analyzing Cold Start Latency in vLLM

🧠LLM Engineering Academic

arxiv.org··Hacker News

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

🤖AI News

newsletter.semianalysis.com

··Hacker News

DiffusionGemma: The Developer Guide- Google Developers Blog

🤖AI Blog

developers.googleblog.com··r/LocalLLaMA

Apple WWDC On-Device AI Deep Dive - Google Docs

gist.is··Hacker News

How we fight GPU scarcity without compromise

🧠LLM Engineering Blog

equixly.com··Hacker News

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

🧠LLM Engineering

huggingface.co··r/LocalLLaMA

DiffusionGemma: 4x Faster Text Generation

🤖AI News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

If Claude Fable stops helping you, you’ll never know

🧠LLM Engineering

simonwillison.net··Hacker News

Token4Token — pay-per-token inference on Gnosis + Swarm

🧠LLM Engineering

t4t.eth.link··Hacker News

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

🤖AI News Blog

kaitchup.substack.com··r/LocalLLaMA

Claude Fable 5 silently degrades its own performance on frontier AI work

🧠LLM Engineering News Blog

mkotlikov.substack.com··Substack

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

vettedconsumer.com··Hacker News

Youssof Altoukhi (@Youssofal_)

🧠LLM Engineering

xcancel.com··r/LocalLLaMA

Claude Fable 5 and new AI safety fables

🛡Cybersecurity News

interconnects.ai··Hacker News

Machinic Psychopharmacology: Do LLMs Self-Medicate?

🧠LLM Engineering

lesswrong.com··Hacker News

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

🧠LLM Engineering Code

github.com··Hacker News

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

🧠LLM Engineering News

digg.com··Hacker News

OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.

🤖AI Blog

huggingface.co··Hacker News, r/LocalLLaMA

Log in to enable infinite scrolling