Mixed Precision, FP16, WMMA, Matrix Multiplication, Deep Learning Acceleration

Deep Integration and the Convergence of Model Architecture and Hardware in AI
dev.to·3h·
Discuss: DEV
🔗NCCL
Flag this post
Can-t stop till you get enough
cant.bearblog.dev·5h·
Discuss: Hacker News
📜TorchScript
Flag this post
Kimi Linear: An Expressive, Efficient Attention Architecture
arxiviq.substack.com·1d·
Discuss: Substack
🧩Attention Kernels
Flag this post
The Evolution of GPUs: How Floating-Point Changed Computing
dell.com·9h·
Discuss: Hacker News
🔧PTX
Flag this post
ZkML Breakthrough: 13B Models Verified in 15 Minutes
lightcapai.medium.com·7h·
Discuss: Hacker News
🔗NCCL
Flag this post
MobileNetV3 Paper Walkthrough: The Tiny Giant Getting Even Smarter
towardsdatascience.com·10h
📉Model Quantization
Flag this post
How can I use an STM32 and FPGA together for a CNN-based face recognition project?
reddit.com·3h·
Discuss: r/embedded
📉Model Quantization
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.com·18h
🧠CPU Architecture
Flag this post
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
paperium.net·55m·
Discuss: DEV
🛠Ml-eng
Flag this post
Radxa Launches AICore DX-M1 Edge AI Accelerator with DeepX DX-M1 NPU
linuxgizmos.com·1d
🔧PTX
Flag this post
Generation at the Speed of Thought: Speculative Decoding
bittere.substack.com·12h·
Discuss: Substack
Flash Attention
Flag this post
zFLoRA: Zero-Latency Fused Low-Rank Adapters
arxiv.org·2d
ONNX Runtime
Flag this post
TinyML is the most impressive piece of software you can run on any ESP32
xda-developers.com·2d
ONNX Runtime
Flag this post
ClipTagger-12B VLM: Frame Captioning Tutorial
dev.to·7h·
Discuss: DEV
🔄ONNX
Flag this post
Yes, you should understand backprop (2016)
karpathy.medium.com·18h·
Discuss: Hacker News
📊Gradient Accumulation
Flag this post
A Practitioner's Guide to Kolmogorov-Arnold Networks
arxiviq.substack.com·5h·
Discuss: Substack
📉Model Quantization
Flag this post
Unlocking AI Potential: Squeezing Giant Models into Tiny Spaces
dev.to·34m·
Discuss: DEV
📉Model Quantization
Flag this post
University of Surrey researchers mimic brain wiring to improve AI - BBC
news.google.com·11h
Flash Attention
Flag this post
The Role of GPUs in Accelerating Deep Learning Training
acecloud.ai·3d·
Discuss: DEV
🔗NCCL
Flag this post
Quantum-Resistant Federated Learning with Homomorphic Encryption for Medical Imaging Diagnostics
dev.to·14h·
Discuss: DEV
🎓Model Distillation
Flag this post