Workflow Optimization, Kernel Launch Overhead, Graph Capture, Task Scheduling

A hitchhiker's guide to CUDA programming
seanzhang.me·2d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
You Don't Always Need Grafana for GPU Monitoring
dev.to·10h·
Discuss: DEV
🔍Nsight
Flag this post
Show HN: GPU-accelerated sandboxes for running AI coding agents in parallel [video]
youtube.com·1d·
Discuss: Hacker News
🔗NCCL
Flag this post
A unified threshold-constrained optimization framework for consistent and interpretable cross-machine condition monitoring
sciencedirect.com·13h
⏱️Benchmarking
Flag this post
Machine Scheduler in LLVM – Part II
myhsu.xyz·4h·
Discuss: Hacker News
📈Occupancy Optimization
Flag this post
Structurally Valid Log Generation using FSM-GFlowNets
arxiv.org·2d
🔄ONNX
Flag this post
Chiplet Chokepoints: Optimizing Interconnects for Peak AI Performance
dev.to·3d·
Discuss: DEV
🌊CUDA Streams
Flag this post
NVIDIA and Samsung working even closer together, new semiconductor AI factory has 50,000+ GPUs
tweaktown.com·8h
🔍Nsight
Flag this post
Get Ready for Clojure, GPU, and AI in 2026 with CUDA 13.0
dragan.rocks·2d·
Discuss: Hacker News
⏱️CUDA Events
Flag this post
Enhancing Workflow Efficiency via Dynamic Task Prioritization & Adaptive Resource Allocation
dev.to·2d·
Discuss: DEV
🔗NCCL
Flag this post
Cycle-accurate 6502 emulator as coroutine in Rust
github.com·19h·
📊Profiling Tools
Flag this post
How to Choose the Right GPU for Your Machine Learning Projects
acecloud.ai·3d·
Discuss: DEV
🔧PTX
Flag this post
Building a Prompt Engineering Toolkit for Developers
amzn.to·8h·
Discuss: DEV
🤖AI Coding Tools
Flag this post
Ambient CI, progress this year
blog.liw.fi·3h
🏗️Build Systems
Flag this post
Perfetto: Swiss Army Knife for Linux Client Tracing
lalitm.com·2d·
📊Profiling Tools
Flag this post
Text rendering and effects using GPU-computed distances
blog.pkh.me·16h·
✂️CUTLASS
Flag this post
A Coding Implementation of a Comprehensive Enterprise AI Benchmarking Framework to Evaluate...
marktechpost.com·1d
🤖AI Coding Tools
Flag this post
Bringing Ideas to Life with 3D Design and Smart Performance Tools
bottleneckscalculators.com·2d·
Discuss: DEV
⏱️CUDA Events
Flag this post
Utilizing Chiplet-Locality For Efficient Memory Mapping In MCM GPUs (ETRI, Sungkyunkwan Univ.)
semiengineering.com·2d
📈Occupancy Optimization
Flag this post