🎮 GPU Microarchitecture - ndjenks · Scour

What 2x GH200 delivers: memory paths for LLM inference 🔴ROCm

dnhkng.github.io·5d

Could NP-hard search trees be tackled through spatial mapping of computation rather than temporal execution? 🖥️Bytecode VMs

github.com·2d·r/compsci

Flash Attention 2 in CuteDSL: A Naive Kernel, Three Optimizations, and Where I Got Stuck ⚡PTX

kyrieblunders.bearblog.dev·5d·Hacker News

Microsoft announces Shader Model 6.10 preview, bringing neural rendering into the mainstream and just maybe making the games industry a bit less reliant on Nvid... 🔴ROCm

·2d

Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight 🖥️GPU Drivers

arxiv.org·13h·Hacker News

On Interaction Nets and Hardware 🌐Distributed Systems

tendrils.co·3d·Lobsters, Hacker News

Local-Run Graph-Based Scalable AGI 🌐Distributed Systems

boggersthefish.com·5d·Hacker News

Kracuible Spiral 🌀 Memory Architecture 🗄️CUDA Memory

youtube.com·4d·r/SideProject

Microarchitecture Tailored to 3D-Stacked Near-Memory Processing LLM Decoding (U. of Edinburg, Peking U., Cambridge et al.) 🗄️CUDA Memory

semiengineering.com·1d

You don't need an expensive GPU to run a local LLM that actually works 🔴ROCm

xda-developers.com·1d

A Matrix-Free Galerkin Multigrid Solver and Failure-Mode Screen for Single-GPU 3D SIMP Linear Systems 🔴ROCm

From 200 lines to 15: How Helion is rewriting the rules of GPU programming 🗄️CUDA Memory

developers.redhat.com·6d

Beginner-Friendly Shader Programming in p5.js v2 (lgm2026) ⚙️PTX-to-SASS

cdn.media.ccc.de·6d

Mojo language, any hardware. Systems-level performance. Pythonic syntax ⚡PTX

modular.com·6d·Hacker News

Efficient, VRAM-Constrained xLM Inference on Clients 🏗️AI Infrastructure

Fast Attention for Short Sequences 📡Signal Processing

blog.qwertyforce.dev·5d·Hacker News

Building An AI Chip: Silicon Design And Advanced Packaging ⚙️ISA Design

semiengineering.com·1d

Your CPU Has More Registers Than You'd Think 🔧Custom CPUs

fp32.org·6d·Lobsters, Hacker News

Show HN: Open-source GPU cost analysis tool 🔴ROCm

github.com·2d·Hacker News

Delegated Execution Sharding (DES): A hyper-parallelized zkEVM for theoretically optimal execution-layer scalability 🖥️Bytecode VMs

ethresear.ch·5d

Sign up or log in to see more results

Log in to enable infinite scrolling