🧠 LLMs - Yezi · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

💾Caching Code

github.com··Hacker News

Why LLMs (still) lack taste

beyondtheprior.com··Hacker News

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

zozo123.github.io··Hacker News

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

⚡Performance Blog

blogs.nvidia.com·

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

⚡Performance News

newsletter.semianalysis.com

··Hacker News

Build a Medical Report Analyzer on Dedicated Inference with Python

digitalocean.com·

Get officially certified in Claude AI for just $19.99

A free diagnostic for the Claude Certified Architect exam

🤖AI Agents Discussion Tutorial

claudecertifiedarchitects.com··Hacker News

AI 101: From Prompt Engineering to Skill Engineering

turingpost.com·

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

✨OpenAI Code

github.com··Hacker News

DiffusionGemma: 4x Faster Text Generation

⚡Performance News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

huggingface.co··r/LocalLLaMA

Token4Token — pay-per-token inference on Gnosis + Swarm

t4t.eth.link··Hacker News

SLUUG Talk: Demystifying Large Language Models on Linux

🤖AI Agents Code

github.com··DEV

The Anthropic leader who built Claude Code says he ditched prompting — now he just writes loops.

thenewstack.io·

DiffusionGemma: The Developer Guide- Google Developers Blog

⚡Performance Blog

developers.googleblog.com··r/LocalLLaMA

Running LLM Inference on Kubernetes: What It Actually Takes

⚡Performance Blog

fairwinds.com·

Less-relevant results

our workplace LLM mass delusion

🏢Engineering Blogs Blog

blog.avas.space··Hacker News

Google Colab CLI opens runtimes to Claude Code and Codex

🗄️Databases

helpnetsecurity.com··r/ClaudeAI

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

⚡Performance Code

github.com··r/LocalLLaMA

Log in to enable infinite scrolling