Asynchronous Execution, Kernel Overlap, GPU Concurrency, Pipeline Parallelism

Small form factor, big impact: Solving edge computing’s space and performance paradox
nordot.app·16h
🌐Distributed Computing
Flag this post
Apache Arrow’s Final Frontier: Replacing Outdated Database Drivers
thenewstack.io·15h
🔧PTX
Flag this post
Deep Integration and the Convergence of Model Architecture and Hardware in AI
dev.to·1d·
Discuss: DEV
🎯Tensor Cores
Flag this post
onedraw — a GPU-driven 2D renderer
dev.to·1d·
Discuss: DEV
✂️CUTLASS
Flag this post
News for October 2025
ptreview.sublinear.info·9h
🔄ONNX
Flag this post
End-to-End Framework Integrating Generative AI and Deep Reinforcement Learning for Autonomous Ultrasound Scanning
arxiv.org·3h
🧮cuDNN
Flag this post
I Benchmarked 3 Go Concurrency Patterns. The "Fastest" One Would Destroy Production
dev.to·2h·
Discuss: DEV
⏱️CUDA Events
Flag this post
Identifying Linux Kernel Instability Due to Poor RCU Synchronization
arxiv.org·3h
⏱️CUDA Events
Flag this post
Replacing my old desktop, a high-end Linux PC
boyter.org·1d
🔧PTX
Flag this post
From Uniform to Adaptive: General Skip-Block Mechanisms for Efficient PDE Neural Operators
arxiv.org·3h
🏎️TensorRT
Flag this post
Free Functions Don't Change Performance (Much)
16bpp.net·18h·
Discuss: Hacker News, r/cpp
📊Profiling Tools
Flag this post
Hyper Hawkes Processes: Interpretable Models of Marked Temporal Point Processes
arxiv.org·3h
🏎️TensorRT
Flag this post
Armada Launches Bridge to Power the Next Generation of AI Infrastructure
prnewswire.com·21h
🔗NCCL
Flag this post
How Data 360 Vector Search Delivers Near Real-Time Intelligence on 90% of Enterprise Data
engineering.salesforce.com·16h
🤖AI Coding Tools
Flag this post
CueBench: Advancing Unified Understanding of Context-Aware Video Anomalies in Real-World
arxiv.org·3h
🧮cuDNN
Flag this post
A Friendly Tour of Process Memory on Linux
0xkato.xyz·9h·
Discuss: Hacker News
📊Profiling Tools
Flag this post
Hybrid-Attention models are the future for SLMs
inference.net·6h·
Discuss: Hacker News
Flash Attention
Flag this post
Connectivity Structure and Dynamics of Nonlinear Recurrent Neural Networks
journals.aps.org·8h
📉Model Quantization
Flag this post
LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation
arxiv.org·3h
Flash Attention
Flag this post