Asynchronous Execution, Kernel Overlap, GPU Concurrency, Pipeline Parallelism

A hitchhiker's guide to CUDA programming
seanzhang.me·4d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.com·2d
🧠CPU Architecture
Flag this post
flowengineR: A Modular and Extensible Framework for Fair and Reproducible Workflow Design in R
arxiv.org·3h
🔄ONNX
Flag this post
My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X
gau-nernst.github.io·1d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
Co-Simulation Framework for Parallel DNN Execution on Chiplet-Based Systems (UW–Madison, Washington State)
semiengineering.com·11h
🎯Tensor Cores
Flag this post
eBPF Tutorial by Example: Monitoring GPU Driver Activity with Kernel Tracepoints
dev.to·58m·
Discuss: DEV
⏱️CUDA Events
Flag this post
Why Multimodal AI Broke the Data Pipeline — And How Daft Is Beating Ray and Spark to Fix It
hackernoon.com·1d
🧮cuDNN
Flag this post
Synopsys and NVIDIA Forge AI Powered Future for Chip Design and Multiphysics Simulation
semiwiki.com·18h
⏱️CUDA Events
Flag this post
On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication
arxiv.org·3h
✂️CUTLASS
Flag this post
Evolving Ray and Kubernetes together for the future of distributed AI and ML
cloud.google.com·15h
🌐Distributed Computing
Flag this post
Attention Is All You Need for KV Cache in Diffusion LLMs
paperium.net·3h·
Discuss: DEV
🎯Tensor Cores
Flag this post
Geonum – geometric number library for unlimited dimensions with O(1) complexity
github.com·17h·
Discuss: Hacker News
✂️CUTLASS
Flag this post
Don't let these 3 CPU specs trick you into paying more
xda-developers.com·12h
Flash Attention
Flag this post
Building Yantra: A Visual Workflow Automation Engine
patali.dev·1d·
Discuss: Hacker News
🤖Automation
Flag this post
ZkML Breakthrough: 13B Models Verified in 15 Minutes
lightcapai.medium.com·1d·
Discuss: Hacker News
🎯Tensor Cores
Flag this post
Hydra: Dual Exponentiated Memory for Multivariate Time Series Analysis
arxiv.org·3h
📊Gradient Accumulation
Flag this post
Uncrossed Multiflows and Applications to Disjoint Paths
arxiv.org·3h
📊CUDA Graphs
Flag this post
Troubleshooting multi-GPU with 2 RTX PRO 6000 Workstation Edition
reddit.com·22h·
Discuss: r/LocalLLaMA
⏱️CUDA Events
Flag this post
PDE-SHARP: PDE Solver Hybrids Through Analysis & Refinement Passes
arxiv.org·3h
✂️CUTLASS
Flag this post
Scaling up Prime Video monitoring service reduced costs 90% (archive) (2023)
web.archive.org·9h·
Discuss: Hacker News
🏗️Build Optimization
Flag this post