CUDA

GPU Programming, Kernel Optimization, Parallel Computing, NVIDIA

Feeds to Scour
SubscribedAll
Scoured 279 posts in 7.1 ms

SK Hynix and Nvidia co-developing memory and memory manufacturing tech

 🔺Triton  Content type: News
blocksandfiles.com·

NVIDIA’s New RTX Spark Superchip Changes Everything for On-the-Go 12K Video Editing and 3D Rendering

 🤖agentic system
canonrumors.com·

On-device AI is a margin decision

 Inference Optimization  Content type: Blog
ziraph.com··Hacker News

WSL 3 will finally let Linux apps use your GPU and NPU without the performance tax

 🔺Triton
xda-developers.com·

Apple WWDC Announces Privacy Leap Off a Cliff

 FlashAttention
flyingpenguin.com·

Google Joins Anthropic in Leasing Compute from SpaceX/xAI

 🔺Triton  Content type: Blog
512pixels.net·

Apple extends Private Cloud Compute to third-party data centers

 🔲TPU Architecture
helpnetsecurity.com·

Location: Edmonton, Canada Remote: Yes Willing to relocate: Yes, within Canada T...

 📊LLM Evaluation  Content type: Discussion

Firefox 153 is finally ditching the Nvidia driver workaround Linux users have hated

 🔺Triton  Content type: News
xda-developers.com·

440 Power! Rare 1970 Plymouth Fury Gran Coupe

 📊LLM Evaluation
barnfinds.com·

Data center infrastructure startup TensorWave raises $350M to help break Nvidia’s AI chip monopoly

 🔺Triton
siliconangle.com·

Unreleased RTX 3050 Ti engineering sample appears in photos and benchmarks — the RTX 3060 alternative that never happened

 🔺Triton  Content type: News
tomshardware.com
·

AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference

 🔺Triton  Content type: Academic
arxiv.org·

Veo with Anders Hellerup Madsen and Gorm Casper

 🔺Triton
corrode.dev··r/rust

AMD Radeon RX 9070 GRE vs. Nvidia GeForce RTX 5070

 🔺Triton
club386.com·

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

 FlashAttention  Content type: Code
github.com··Hacker News

Introducing Piper: A Programmable Distributed Training System

 🎭Mixture of Experts  Content type: Academic  Content type: Blog

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

 Inference Optimization  Content type: Blog
bric.pe.kr··DEV

DiffusionGemma: The Developer Guide

 💾KV Cache  Content type: Blog

BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster

 Inference Optimization
sleepingrobots.com·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help