🤖 AI - kate.yang · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

💬LLMs Code

github.com··Hacker News

Report: GKE Inference Gateway delivers up to 92% faster AI responses

📰AI News Blog

cloud.google.com··Hacker News

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

💬LLMs Academic

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

zozo123.github.io··Hacker News

Why LLMs (still) lack taste

✨Generative AI

beyondtheprior.com··Hacker News

Using Scikit-LLM with Open-Source LLMs

machinelearningmastery.com·

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

🤖AI Agents Blog

adambien.blog·

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

💬LLMs News

newsletter.semianalysis.com

··Hacker News

Apple WWDC On-Device AI Deep Dive - Google Docs

✨Generative AI

gist.is··Hacker News

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

📰AI News Blog

blogs.nvidia.com·

Conversational AI vs generative AI: What's the difference?

✨Generative AI

Token4Token — pay-per-token inference on Gnosis + Swarm

t4t.eth.link··Hacker News

DiffusionGemma: The Developer Guide- Google Developers Blog

📰AI News Blog

developers.googleblog.com··r/LocalLLaMA

LeLab Is Hugging Face’s New Browser-Based GUI for the LeRobot Ecosystem

📰AI News News

MLPerf and the rise of latency-aware LLM benchmarking

DiffusionGemma: 4x Faster Text Generation

✨Generative AI News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

Melanie Mitchell: What We Get Wrong About AI

✨Generative AI

yalereview.org··Substack, Hacker News, Hacker News

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

✨Generative AI

Build a Medical Report Analyzer on Dedicated Inference with Python

digitalocean.com·

Google's new open model DiffusionGemma generates text from noise instead of word by word

✨Generative AI

the-decoder.com

·

Log in to enable infinite scrolling