🖥️ GPU Programming - jhcha.oyo · Scour

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

⚡Flash Attention Code

github.com··Hacker News

AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference

🤖AI Academic

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

smolhub.com··r/LocalLLaMA

RenderLab – Prototype rendering techniques and renderers in the browser

✨Computer Graphics

pub.prklinteractive.com··Hacker News

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

💬LLMs News

newsletter.semianalysis.com

··Hacker News

Open source building blocks for computational design. Est. 2006

💻Programming Languages

thi.ng··Hacker News

Why Compiler Engineers Rarely Use Strassen's Algorithm for Fast Matrix Multiplications

⚡Hardware Acceleration News Blog

leetarxiv.substack.com··Substack, r/programming

Unsloth Gemma 4 QAT

⚡Quantization

NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure

✨Computer Graphics Blog

blogs.nvidia.com··Hacker News

nex-agi/Nex-N2-mini • Huggingface

huggingface.co··r/LocalLLaMA

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

⚡Hardware Acceleration Academic

I stopped using most of Rust’s advanced features for my ML library

🤖AI Code

github.com··r/rust

Unpacking AI: The Hardware Behind AI

🤖AI News

pathtostaff.com··Hacker News

ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

✨Computer Graphics Academic

sgl-project/sglang-omni: SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models

⚡Hardware Acceleration Code

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

🤖AI Academic

arxiv.org··Hacker News

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

⚡Hardware Acceleration Academic

maziyarpanahi/openmed: open-source healthcare ai

🤖AI Code

LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization

💬LLMs Academic

SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines

⚡Hardware Acceleration Academic

Log in to enable infinite scrolling