My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X
gau-nernst.github.io·1d·
Discuss: Hacker News
Model Efficiency
Flag this post
H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention
arxiv.org·7h
Model Efficiency
Flag this post
GPU Pro – Master Your AI Workflow
github.com·1d·
Model Efficiency
Flag this post
Intel's LLM-Scaler Updated With OpenAI's GPT-OSS Model Support
phoronix.com·1h
LLM Optimization
Flag this post
Why stop at 1 million tokens when you can have 10? My journey to extreme context on a gaming GPU. [P]
reddit.com·1h·
Model Efficiency
Flag this post
Why Multimodal AI Broke the Data Pipeline — And How Daft Is Beating Ray and Spark to Fix It
hackernoon.com·1d
Model Efficiency
Flag this post
ZkML Breakthrough: 13B Models Verified in 15 Minutes
lightcapai.medium.com·1d·
Discuss: Hacker News
LLM Optimization
Flag this post
Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs
arxiv.org·1d
Model Efficiency
Flag this post
Automated Anomaly Detection and Self-Calibration in CMUT Array Fabrication via Bayesian Optimization
dev.to·1d·
Discuss: DEV
Model Efficiency
Flag this post
Why stop at 1M tokens when you can have 10M?
news.ycombinator.com·1h·
Discuss: Hacker News
Model Efficiency
Flag this post
Hybrid-Attention models are the future for SLMs
inference.net·10h·
Discuss: Hacker News
Model Efficiency
Flag this post
AMD Will Continue Game Optimization Support For Older Radeon GPU's After All
tech.slashdot.org·12h
Model Efficiency
Flag this post
This Month in Ladybird – October 2025
ladybird.org·56m·
Discuss: Hacker News
🛠️Developer Tools
Flag this post
A Thesis and Playbook for Edge AI
ondeviceguy.substack.com·1d·
Discuss: Substack
Model Efficiency
Flag this post
Torchforge – a PyTorch native library for scalable RL post-training
pytorch.org·5d·
Discuss: Hacker News
Model Efficiency
Flag this post
Dive into Systems
diveintosystems.org·19h·
Discuss: Hacker News
✍️Prompt Engineering
Flag this post
CueBench: Advancing Unified Understanding of Context-Aware Video Anomalies in Real-World
arxiv.org·7h
✍️Prompt Engineering
Flag this post
Labs for Broke – EKS for Pennies
georgedeblog.com·7h·
Discuss: Hacker News
Model Efficiency
Flag this post
The next RISC-V processor frontier: AI
edn.com·4d·
Discuss: Hacker News
Model Efficiency
Flag this post
Casing Collar Identification using AlexNet-based Neural Networks for Depth Measurement in Oil and Gas Wells
arxiv.org·7h
LLM Optimization
Flag this post