Can-t stop till you get enough
๐TorchScript
Flag this post
Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs
arxiv.orgยท42m
๐NCCL
Flag this post
onedraw โ a GPU-driven 2D renderer
โ๏ธCUTLASS
Flag this post
A hitchhiker's guide to CUDA programming
๐ฏGPU Kernels
Flag this post
Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)
๐๏ธAttention Optimization
Flag this post
I made a tensor runtime & inference framework in C (good for learning how inference works)
๐TorchScript
Flag this post
Intel's killed-off BMG-X3/X4 GPUs: 3D stacked die, up to 40 GPU cores, 512MB Adamantine cache
tweaktown.comยท8h
๐งPTX
Flag this post
Scalable In-Memory Associative Processing for Graph Neural Network Inference
โกFlash Attention
Flag this post
A unified threshold-constrained optimization framework for consistent and interpretable cross-machine condition monitoring
sciencedirect.comยท1d
โฑ๏ธBenchmarking
Flag this post
Deep Neural Watermarking for Robust Copyright Protection in 3D Point Clouds
arxiv.orgยท42m
๐งฎcuDNN
Flag this post
Text rendering and effects using GPU-computed distances
blog.pkh.meยท1d
โ๏ธCUTLASS
Flag this post
Integer overflow checking with C23
blog.gnoack.orgยท9h
๐ฌStatic Analysis
Flag this post
Loading...Loading more...