CUDA

Feeds to Scour
SubscribedAll
Scoured 52 posts in 12.6 ms

CUDA-Oxide 0.2 Brings Early Improvements To Pure Rust CUDA Kernels

 💻OS
phoronix.com·

GPUsnek is Python on nVidia’s CUDA

 💻OS  Content type: Blog
blog.adafruit.com·

WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

 💻OS  Content type: Academic
arxiv.org·
Less-relevant results

First Steps Toward Automated AI Research

 💻OS
recursive.com··Hacker News

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

 💻OS  Content type: Code
github.com··Hacker News

Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]

 🎮GPU Architecture
openjdk.org··Lobsters, r/java

Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP

 💻OS  Content type: Blog
huggingface.co··Hacker News

Making FlashAttention-4 faster for inference

 💻OS  Content type: Blog
modal.com·

SoC FPGA advances wideband RF processing

 🎮GPU Architecture
edn.com·

Vortex expands open RISC-V graphics

 🎮GPU Architecture
jonpeddie.com·

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

 MLSys  Content type: Code
github.com··Hacker News

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

 💻OS  Content type: Academic
arxiv.org··Hacker News

Google unveils DiffusionGemma, delivering up to 4x faster inference on dedicated GPUs

 💡FlashAttention
alternativeto.net·

Vortex 3.0 Released As Full-Stack, Open-Source RISC-V GPU Now With 3D Pipeline

 💻OS
phoronix.com·

NVIDIA at Computex 2026: RTX Spark Gaming Hands-On, DLSS 4.5, and More

 💻OS
techpowerup.com·

Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training

 📦TVM  Content type: News

DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off

 💡FlashAttention
androidauthority.com·

Big Banks Eye New AI Compute Trading Market

 📦TVM
pymnts.com·

Google's new open-weights model brings image-generation tricks to AI text generation

 MLSys  Content type: News
theregister.com·

Google’s DiffusionGemma is 4x faster than its other Gemma models

 💡FlashAttention
thenewstack.io·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help