🟩 CUDA - emulbasaka

💻OS Blog

blog.adafruit.com·

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

💻OS Academic

arxiv.org··Hacker News

Less-relevant results

First Steps Toward Automated AI Research

💻OS

recursive.com··Hacker News

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

💻OS Code

github.com··Hacker News

Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]

🎮GPU Architecture

openjdk.org··Lobsters, r/java

Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP

💻OS Blog

huggingface.co··Hacker News

SoC FPGA advances wideband RF processing

🎮GPU Architecture

edn.com·

WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

💻OS Academic

arxiv.org·

Google unveils DiffusionGemma, delivering up to 4x faster inference on dedicated GPUs

💡FlashAttention

alternativeto.net·

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

⚙MLSys Code

github.com··Hacker News

NVIDIA at Computex 2026: RTX Spark Gaming Hands-On, DLSS 4.5, and More

💻OS

techpowerup.com·

Vortex expands open RISC-V graphics

🎮GPU Architecture

jonpeddie.com·

Vortex 3.0 Released As Full-Stack, Open-Source RISC-V GPU Now With 3D Pipeline

💻OS

phoronix.com·

Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training

📦TVM News

tomshardware.com

··Hacker News

DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off

💡FlashAttention

androidauthority.com·

Big Banks Eye New AI Compute Trading Market

📦TVM

pymnts.com·

Google’s DiffusionGemma is 4x faster than its other Gemma models

💡FlashAttention

thenewstack.io·

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

⚙MLSys Blog

blogs.nvidia.com·

Supermicro Stock Falls On Plans To Raise $7Bn In Capital

🐧Kernel Dev

catenaa.com··Hacker News

CUDA-Oxide 0.2 Brings Early Improvements To Pure Rust CUDA Kernels

GPUsnek is Python on nVidia’s CUDA

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

First Steps Toward Automated AI Research

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]

Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP

SoC FPGA advances wideband RF processing

WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

Google unveils DiffusionGemma, delivering up to 4x faster inference on dedicated GPUs

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

NVIDIA at Computex 2026: RTX Spark Gaming Hands-On, DLSS 4.5, and More

Vortex expands open RISC-V graphics

Vortex 3.0 Released As Full-Stack, Open-Source RISC-V GPU Now With 3D Pipeline

Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training

DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off

Big Banks Eye New AI Compute Trading Market

Google’s DiffusionGemma is 4x faster than its other Gemma models

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

Supermicro Stock Falls On Plans To Raise $7Bn In Capital