Multi-GPU Communication, Collective Operations, Distributed Training, AllReduce

My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X
gau-nernst.github.ioยท4hยท
Discuss: Hacker News
๐ŸŽฏGPU Kernels
Flag this post
Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs
arxiv.orgยท42m
๐ŸŽฏTensor Cores
Flag this post
Deep Integration and the Convergence of Model Architecture and Hardware in AI
dev.toยท9hยท
Discuss: DEV
๐ŸŽฏTensor Cores
Flag this post
Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)
sebastianraschka.comยท2hยท
Discuss: r/LLM
๐Ÿ‘๏ธAttention Optimization
Flag this post
Show HN: GPU-accelerated sandboxes for running AI coding agents in parallel [video]
youtube.comยท2dยท
Discuss: Hacker News
๐Ÿค–AI Coding Tools
Flag this post
ZkML Breakthrough: 13B Models Verified in 15 Minutes
lightcapai.medium.comยท13hยท
Discuss: Hacker News
๐ŸŽฏTensor Cores
Flag this post
GPU Pro โ€“ Master Your AI Workflow
github.comยท10hยท
Discuss: Hacker News
๐Ÿ”Nsight
Flag this post
A faster problem-solving tool that guarantees feasibility
news.mit.eduยท42m
โšกONNX Runtime
Flag this post
AndesVL Technical Report: An Efficient Mobile-side Multimodal Large LanguageModel
paperium.netยท1dยท
Discuss: DEV
๐ŸŽ๏ธTensorRT
Flag this post
onedraw โ€” a GPU-driven 2D renderer
dev.toยท16hยท
Discuss: DEV
โœ‚๏ธCUTLASS
Flag this post
A Practitioner's Guide to Kolmogorov-Arnold Networks
arxiviq.substack.comยท11hยท
Discuss: Substack
๐Ÿ“‰Model Quantization
Flag this post
VISTA Score: Verification In Sequential Turn-based Assessment
arxiv.orgยท42m
๐Ÿ”„ONNX
Flag this post
The Role of GPUs in Accelerating Deep Learning Training
acecloud.aiยท3dยท
Discuss: DEV
๐ŸŽ๏ธTensorRT
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.comยท1d
๐Ÿง CPU Architecture
Flag this post
I made a tensor runtime & inference framework in C (good for learning how inference works)
github.comยท4hยท
๐Ÿ“œTorchScript
Flag this post
Building Yantra: A Visual Workflow Automation Engine
patali.devยท2hยท
Discuss: Hacker News
๐Ÿค–Automation
Flag this post
Rethinking Networking for the AI/ML Era
lukew.comยท2d
๐ŸŒDistributed Computing
Flag this post
Federico Biancuzzi, Shane Warden, & Anders Hejlsberg
deprogrammaticaipsum.comยท2h
๐Ÿ’กLSP
Flag this post
AI-Driven Biomarker Discovery for Accelerated Orphan Drug Development
dev.toยท3hยท
Discuss: DEV
๐Ÿ”„ONNX
Flag this post
Opportunistically Parallel Lambda Calculus
dl.acm.orgยท3dยท
Discuss: Hacker News
๐Ÿ’กLSP
Flag this post