CUDA Linear Algebra, Matrix Operations, GPU BLAS, cuBLASLt

Geonum – geometric number library for unlimited dimensions with O(1) complexity
github.com·5h·
Discuss: Hacker News
✂️CUTLASS
Flag this post
My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X
gau-nernst.github.io·18h·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
Tetrahedral analog of the Pythagorean theorem
johndcook.com·3h
✂️CUTLASS
Flag this post
Can-t stop till you get enough
cant.bearblog.dev·1d·
Discuss: Hacker News
📜TorchScript
Flag this post
onedraw — a GPU-driven 2D renderer
dev.to·1d·
Discuss: DEV
✂️CUTLASS
Flag this post
Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs
arxiv.org·15h
🔗NCCL
Flag this post
A hitchhiker's guide to CUDA programming
seanzhang.me·3d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
A fun application of Green’s functions and geometric algebra: Residue calculus
peeterjoot.com·15h
✂️CUTLASS
Flag this post
Dive into Systems
diveintosystems.org·3h·
Discuss: Hacker News
⚙️Systems Programming
Flag this post
Why Multimodal AI Broke the Data Pipeline — And How Daft Is Beating Ray and Spark to Fix It
hackernoon.com·14h
🧮cuDNN
Flag this post
Looking for a partner to study graphics programming with
reddit.com·8h·
Discuss: r/gamedev
🎮NVIDIA
Flag this post
Synopsys and NVIDIA Forge AI Powered Future for Chip Design and Multiphysics Simulation
semiwiki.com·6h
🌊CUDA Streams
Flag this post
Cracking the Cube: How Competitive Rubik’s Cube Algorithms Inspire Modern AI and Programming
dev.to·1h·
Discuss: DEV
🤖AI Coding Tools
Flag this post
Julia 1.12 Adds Trim Feature
i-programmer.info·4h
🐕Ruff
Flag this post
Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)
sebastianraschka.com·16h·
Discuss: r/LLM
👁️Attention Optimization
Flag this post
Fast, Scalable LDA in C++ with Stochastic Variational Inference
github.com·4h·
Discuss: r/cpp
🏎️TensorRT
Flag this post
ZkML Breakthrough: 13B Models Verified in 15 Minutes
lightcapai.medium.com·1d·
Discuss: Hacker News
🎯Tensor Cores
Flag this post
Intel's killed-off BMG-X3/X4 GPUs: 3D stacked die, up to 40 GPU cores, 512MB Adamantine cache
tweaktown.com·23h
🔧PTX
Flag this post
Introducing Agent-o-rama: build, trace, evaluate, and monitor stateful LLM agents in Java or Clojure
blog.redplanetlabs.com·2h·
Discuss: Hacker News
🤖AI Coding Tools
Flag this post
A Practitioner's Guide to Kolmogorov-Arnold Networks
arxiviq.substack.com·1d·
Discuss: Substack
📉Model Quantization
Flag this post