Mixed Precision, FP16, WMMA, Matrix Multiplication, Deep Learning Acceleration

Deep Integration and the Convergence of Model Architecture and Hardware in AI
dev.to·1d·
Discuss: DEV
🔗NCCL
Flag this post
Can-t stop till you get enough
cant.bearblog.dev·1d·
Discuss: Hacker News
📜TorchScript
Flag this post
Beyond ImageNet: Understanding Cross-Dataset Robustness of Lightweight Vision Models
arxiv.org·3h
🧮cuDNN
Flag this post
Attention Is All You Need for KV Cache in Diffusion LLMs
paperium.net·4h·
Discuss: DEV
🔲Loop Tiling
Flag this post
A Soft‑Fork Proposal for Blockchain‑Based Distributed AI Computation
hackernoon.com·21h
🔗NCCL
Flag this post
My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X
gau-nernst.github.io·1d·
Discuss: Hacker News
🎯GPU Kernels
Flag this post
Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)
sebastianraschka.com·1d·
Discuss: r/LLM
👁️Attention Optimization
Flag this post
Hybrid-Attention models are the future for SLMs
inference.net·6h·
Discuss: Hacker News
Flash Attention
Flag this post
Co-Simulation Framework for Parallel DNN Execution on Chiplet-Based Systems (UW–Madison, Washington State)
semiengineering.com·11h
🌊CUDA Streams
Flag this post
Don't let these 3 CPU specs trick you into paying more
xda-developers.com·12h
Flash Attention
Flag this post
I made a tensor runtime & inference framework in C (good for learning how inference works)
github.com·1d·
📜TorchScript
Flag this post
On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication
arxiv.org·3h
✂️CUTLASS
Flag this post
Transformer-Based Decoding in Concatenated Coding Schemes Under Synchronization Errors
arxiv.org·3h
Flash Attention
Flag this post
Kimi Linear: An Expressive, Efficient Attention Architecture
arxiviq.substack.com·2d·
Discuss: Substack
🧩Attention Kernels
Flag this post
The Evolution of GPUs: How Floating-Point Changed Computing
dell.com·1d·
Discuss: Hacker News
🔧PTX
Flag this post
Real-time stock volatility prediction with deep learning on a time-series DB
medium.com·34m·
Discuss: Hacker News
ONNX Runtime
Flag this post
MobileNetV3 Paper Walkthrough: The Tiny Giant Getting Even Smarter
towardsdatascience.com·1d
📉Model Quantization
Flag this post
Neuromorphic Computing: Building Brain-Inspired Processors to Revolutionize Technology
thetasvibe.blogspot.com·1d
Flash Attention
Flag this post
Design of quasi phase matching crystal based on differential gray wolf algorithm
arxiv.org·3h
🌐Distributed Computing
Flag this post