🎮 GPU Microarchitecture - ndjenks · Scour

AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving 🗄️CUDA Memory

Microsoft Previews Shader Model 6.10 with New DirectX GPU Features 🔴ROCm

hothardware.com·1d

Precision Timed (PRET) Machines Edward Lee ⚙️ISA Design

ptolemy.berkeley.edu·5d

PCIe 7.0 fundamentals: Baseline ordering rules 🔴ROCm

What the DRAM Crunch Teaches Us About System Design 🗄️CUDA Memory

eetimes.com·2d·Hacker News, r/hardware

Show HN: Utilyze, an open source GPU monitoring tool more accurate than nvtop 🔴ROCm

systalyze.com·3d·Hacker News

Inside Google's TPU V8 strategy, delivering two chips for two crucial tasks — scale up size gives chips an advantage over Nvidia AI accelerators ⚡PTX

tomshardware.com

·2d

No Tile Left Behind: Multiprogramming for Surface-Code Architectures 🖥️Bytecode VMs

DirectX 12 Agility SDK 1.619 introduces Shader Model 6.9: Microsoft is bringing modern GPU features out of preview and into everyday use ⚡PTX

igorslab.de·5d

Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max 🔧Custom CPUs

llmkube.com·2d·r/LocalLLaMA

NoC Coherency Challenges Balloon With AI SoCs And Chiplets 🌐Distributed Systems

semiengineering.com·8h

Show HN: 1990s Game Dev Algorithms for Distributed Systems 🌐Distributed Systems

docs.merca.earth·1d·Hacker News

DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles 🏗️AI Infrastructure

lmsys.org·4d·Hacker News

Microsoft's Shader Model 6.10 Opens Direct Access to GPU AI Engines 🔴ROCm

techpowerup.com·2d

D3D12 LinAlg Matrix Preview ⚡PTX

devblogs.microsoft.com·2d

Inside the Surprising Performance Gaps Between ‘Identical’ GPUs 🖥️GPU Drivers

spectrum.ieee.org

·6d

New Intel driver lets you dedicate 93% of system memory to the iGPU for VRAM, enabling support for larger AI models 🖥️GPU Drivers

tweaktown.com·2d

Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight 🖥️GPU Drivers

What 2x GH200 delivers: memory paths for LLM inference 🔴ROCm

dnhkng.github.io·5d

Show HN: I built a 2nd-order PyTorch optimizer for LLMs that runs on 16GB GPUs 🔴ROCm

news.ycombinator.com·1d·Hacker News

Log in to enable infinite scrolling