My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X
gau-nernst.github.io·16h·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
A hitchhiker's guide to CUDA programming
seanzhang.me·3d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
Geonum – geometric number library for unlimited dimensions with O(1) complexity
github.com·3h·
Discuss: Hacker News
✂️CUTLASS
Flag this post
Troubleshooting multi-GPU with 2 RTX PRO 6000 Workstation Edition
reddit.com·7h·
Discuss: r/LocalLLaMA
⏱️CUDA Events
Flag this post
Assessing DRAM Data Retention via Quantum-Tunneling Lifetime Mapping
dev.to·9h·
Discuss: DEV
⏱️Benchmarking
Flag this post
Essential Things to Know Before Upgrading Your Computer Memory
buysellram.com·59m·
Discuss: Hacker News
⚙️Systems Programming
Flag this post
Evolving Ray and Kubernetes together for the future of distributed AI and ML
cloud.google.com·29m
🌐Distributed Computing
Flag this post
Cons Should Not Cons Its Arguments, Part II: Cheney on the MTA
web.archive.org·11h·
Discuss: Hacker News
🚀Compiler Optimization
Flag this post
(PR) SK hynix CEO Kwak Announces the New Vision of Full Stack AI Memory Creator
techpowerup.com·13h
🔧PTX
Flag this post
Get Ready for Clojure, GPU, and AI in 2026 with CUDA 13.0
dragan.rocks·4d·
Discuss: Hacker News
⏱️CUDA Events
Flag this post
DGX Spark UMA can trick you
bartusiak.ai·3d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
onedraw — a GPU-driven 2D renderer
dev.to·1d·
Discuss: DEV
✂️CUTLASS
Flag this post
Can-t stop till you get enough
cant.bearblog.dev·23h·
Discuss: Hacker News
📜TorchScript
Flag this post
Utilizing Chiplet-Locality For Efficient Memory Mapping In MCM GPUs (ETRI, Sungkyunkwan Univ.)
semiengineering.com·4d
📈Occupancy Optimization
Flag this post
Some Fun Videos on Optimizing NES Code
bumbershootsoft.wordpress.com·1d
🚀Compiler Optimization
Flag this post
Building blobd: single-machine object store with sub-millisecond reads and 15 GB/s uploads
blog.wilsonl.in·17h·
Discuss: Hacker News
🌳Git Internals
Flag this post
Free Functions Don't Change Performance (Much)
16bpp.net·3h·
Discuss: Hacker News, r/cpp
📊Profiling Tools
Flag this post
Why Multimodal AI Broke the Data Pipeline — And How Daft Is Beating Ray and Spark to Fix It
hackernoon.com·12h
🧮cuDNN
Flag this post
Production-Ready Rate Limiter in Go: From Side Project to Distributed System
dev.to·7h·
Discuss: DEV
🐕Ruff
Flag this post
Cycle-accurate 6502 emulator as coroutine in Rust
github.com·2d·
📊Profiling Tools
Flag this post