A hitchhiker's guide to CUDA programming
🎯GPU Kernels
Flag this post
TIL: For long-lived LLM sessions, swapping KV Cache to RAM is ~10x faster than recalculating it. Why isn't this a standard feature?
🔲Loop Tiling
Flag this post
DGX Spark UMA can trick you
🎯GPU Kernels
Flag this post
Can-t stop till you get enough
📜TorchScript
Flag this post
Utilizing Chiplet-Locality For Efficient Memory Mapping In MCM GPUs (ETRI, Sungkyunkwan Univ.)
semiengineering.com·3d
📈Occupancy Optimization
Flag this post
Some Fun Videos on Optimizing NES Code
bumbershootsoft.wordpress.com·1d
🚀Compiler Optimization
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.com·22h
🧠CPU Architecture
Flag this post
Deep Integration and the Convergence of Model Architecture and Hardware in AI
🎯Tensor Cores
Flag this post
Intel's killed-off BMG-X3/X4 GPUs: 3D stacked die, up to 40 GPU cores, 512MB Adamantine cache
tweaktown.com·6h
🔧PTX
Flag this post
PCIe lanes are the real currency of modern PCs
xda-developers.com·7h
⏱️CUDA Events
Flag this post
New AI models Cursor and Cognition (Windsurf) built on Chinese base models
🤖AI Coding Tools
Flag this post
90% RAM usage while gaming
📈GPU Occupancy
Flag this post
Loading...Loading more...