My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X
gau-nernst.github.io·1d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
Don't let these 3 CPU specs trick you into paying more
xda-developers.com·13h
Flash Attention
Flag this post
eBPF Tutorial by Example: Monitoring GPU Driver Activity with Kernel Tracepoints
dev.to·1h·
Discuss: DEV
⏱️CUDA Events
Flag this post
Low-Level Hacks
blog.raycursive.com·6h·
Discuss: Hacker News
📊Profiling Tools
Flag this post
Machine Scheduler in LLVM – Part II
myhsu.xyz·2d·
⚙️Systems Programming
Flag this post
Hybrid Quantum-Classical Optimization of the Resource Scheduling Problem
arxiv.org·4h
🌐Distributed Computing
Flag this post
Attention Is All You Need for KV Cache in Diffusion LLMs
paperium.net·4h·
Discuss: DEV
🎯Tensor Cores
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.com·2d
🧠CPU Architecture
Flag this post
A Friendly Tour of Process Memory on Linux
0xkato.xyz·9h·
Discuss: Hacker News
📊Profiling Tools
Flag this post
On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication
arxiv.org·4h
✂️CUTLASS
Flag this post
Why Multimodal AI Broke the Data Pipeline — And How Daft Is Beating Ray and Spark to Fix It
hackernoon.com·1d
🧮cuDNN
Flag this post
90% RAM usage while gaming
preview.redd.it·2d·
Discuss: r/computers
📈GPU Occupancy
Flag this post
Strix Halo's Memory Subsystem: Tackling iGPU Challenges
chipsandcheese.com·3d·
Discuss: Hacker News
📈GPU Occupancy
Flag this post
Essential Things to Know Before Upgrading Your Computer Memory
buysellram.com·16h·
Discuss: Hacker News
⚙️Systems Programming
Flag this post
I want to run 8x 5060 ti to run gpt-oss 120b
reddit.com·16h·
Discuss: r/LocalLLaMA
🎯GPU Kernels
Flag this post
Dive into Systems
diveintosystems.org·16h·
Discuss: Hacker News
⚙️Systems Programming
Flag this post
Big-O Notation: Explained in 8 Minutes
blog.algomaster.io·5h
🚀Compiler Optimization
Flag this post
A hitchhiker's guide to CUDA programming
seanzhang.me·4d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
GIGABYTE sets record DDR5 speed OC world record on Z890 AORUS Tachyon ICE mobo with 13,034 MT/s
tweaktown.com·2h
⏱️Benchmarking
Flag this post
Limitations of a two-pass assembler
boston.conman.org·6h
🚀Compiler Optimization
Flag this post