Workflow Optimization, Kernel Launch Overhead, Graph Capture, Task Scheduling

eBPF Tutorial by Example: Monitoring GPU Driver Activity with Kernel Tracepoints
dev.to·1h·
Discuss: DEV
⏱️CUDA Events
Flag this post
Uncrossed Multiflows and Applications to Disjoint Paths
arxiv.org·4h
🌊CUDA Streams
Flag this post
My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X
gau-nernst.github.io·1d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.com·2d
🧠CPU Architecture
Flag this post
GPU Pro – Master Your AI Workflow
github.com·1d·
🔍Nsight
Flag this post
A hitchhiker's guide to CUDA programming
seanzhang.me·4d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
Evolving Ray and Kubernetes together for the future of distributed AI and ML
cloud.google.com·16h
🌐Distributed Computing
Flag this post
Attention Is All You Need for KV Cache in Diffusion LLMs
paperium.net·4h·
Discuss: DEV
🎯Tensor Cores
Flag this post
DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads
arxiv.org·4h
🔗NCCL
Flag this post
LangChain vs LangGraph: A Beginner’s Guide to Building Smarter AI Workflows
hackernoon.com·17h
🤖AI Coding Tools
Flag this post
Building Yantra: A Visual Workflow Automation Engine
patali.dev·1d·
Discuss: Hacker News
🤖Automation
Flag this post
Troubleshooting multi-GPU with 2 RTX PRO 6000 Workstation Edition
reddit.com·23h·
Discuss: r/LocalLLaMA
⏱️CUDA Events
Flag this post
flowengineR: A Modular and Extensible Framework for Fair and Reproducible Workflow Design in R
arxiv.org·4h
🔄ONNX
Flag this post
Dive into Systems
diveintosystems.org·16h·
Discuss: Hacker News
⚙️Systems Programming
Flag this post
Cisco unveils Unified Edge platform for real-time AI workloads
zawya.com·1h
🔗NCCL
Flag this post
Disciplined Biconvex Programming
arxiv.org·4h
📉Model Quantization
Flag this post
Co-Simulation Framework for Parallel DNN Execution on Chiplet-Based Systems (UW–Madison, Washington State)
semiengineering.com·12h
🌊CUDA Streams
Flag this post
Don't let these 3 CPU specs trick you into paying more
xda-developers.com·13h
Flash Attention
Flag this post
I Benchmarked 3 Go Concurrency Patterns. The "Fastest" One Would Destroy Production
dev.to·3h·
Discuss: DEV
⏱️CUDA Events
Flag this post
Geonum – geometric number library for unlimited dimensions with O(1) complexity
github.com·18h·
Discuss: Hacker News
✂️CUTLASS
Flag this post