📊 AI Performance Profiling - pleto · Scour

Computex 2026: XMG Pro 18 Brings 12 GB VRAM to Upper Mid-Range Mobile Segment

🔧Systems-level optimizations for LLM serving

techpowerup.com·

DiffusionGemma: 4x Faster Text Generation

🧠Large Language Models (LLMs) News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

How we fight GPU scarcity without compromise

🧠Large Language Models (LLMs) Blog

equixly.com··Hacker News

Microsoft just shared the frontier data engineering secrets

🤖Agents using LLMs

mail.bycloud.ai·

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax

🔧Systems-level optimizations for LLM serving Blog

lucebox.com··Hacker News

DiffusionGemma: The Developer Guide- Google Developers Blog

🚀LLM serving frameworks Blog

developers.googleblog.com··r/LocalLLaMA

WEKA software speeds long context AI inferencing on Oracle’s public cloud

🔧Systems-level optimizations for LLM serving News

blocksandfiles.com·

Stop Wasting GPU Budget: Autoscaling AI Inference on Kubernetes with KEDA

⚙️AI Infrastructure Automation

cloudnativenow.com·

sgl-project/sglang-omni: SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models

🔧Systems-level optimizations for LLM serving Code

Nutanix Unified Storage Earns Enterprise-Level NVIDIA Certification for Production AI Workloads

🔍Retrieval-augmented generation

storagereview.com·

Google's new open model DiffusionGemma generates text from noise instead of word by word

🧠Large Language Models (LLMs)

the-decoder.com

·

Tired of GitHub Trending being GitHub-only, so we made a multi-forge version (GitLab and Codeberg included)

🤖Agents using LLMs

gitgem.org··Hacker News, r/opensource

(PR) Gigabyte Announces New AERO X16 Laptop With AMD Ryzen AI 9 465 CPU and NVIDIA GeForce RTX 5070 Laptop GPU

🤖Agents using LLMs

techpowerup.com·

Fast Speech Foundation Model Distillation Using Interleaved Stacking

⚙️AI Infrastructure Automation Academic

The economics of speculative decoding

🔧Systems-level optimizations for LLM serving Blog

fergusfinn.com··Hacker News

Context engineering for AI agents: the infrastructure behind every decision

🧠Large Language Models (LLMs) Blog

A system programmer’s guide to LLM inference

🔧Systems-level optimizations for LLM serving Blog

blog.xiangpeng.systems··Hacker News

Making Local LLM Fast

🧠Large Language Models (LLMs)

bogdan.nimblex.net··Hacker News

AMD RX 9070 GRE AI, Blender benchmarks vs 9070 XT, 7800XT, Nvidia RTX 5070, 4070

⚡Real-time AI Systems

A Practical Guide to Kubernetes Multi-Tenancy: Best Practices and Approaches

⚙️AI Infrastructure Automation Blog

Log in to enable infinite scrolling