Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
CUDA
🟩 CUDA
Specific
GPU Kernels, Parallel Computing, NVIDIA, PTX
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
49
posts in
9.4
ms
CUDA-Oxide
0.2 Brings Early Improvements To Pure Rust
CUDA
Kernels
💻
OS
phoronix.com
·
6d
6 days ago
Actions for CUDA-Oxide 0.2 Brings Early Improvements To Pure Rust CUDA Kernels
GPUsnek is Python on
nVidia
’s
CUDA
💻
OS
Content type:
Blog
blog.adafruit.com
·
21h
21 hours ago
Actions for GPUsnek is Python on nVidia’s CUDA
AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
💻
OS
Content type:
Academic
arxiv.org
·
2d
2 days ago
·
Hacker News
Actions for AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
Less-relevant results
First Steps Toward Automated AI Research
💻
OS
recursive.com
·
2h
2 hours ago
·
Hacker News
Actions for First Steps Toward Automated AI Research
RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting
CUDA
megakernel and self-tunes it past
cuBLAS
at batch-1 LLM decode.
💻
OS
Content type:
Code
github.com
·
3d
3 days ago
·
Hacker News
Actions for RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.
Exploiting
GPU
Tensor Cores from Java using Babylon [Juan Fumero]
🎮
GPU Architecture
openjdk.org
·
2d
2 days ago
·
Lobsters
,
r/java
Actions for Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]
Profiling in PyTorch (
Part
2): From Nn.Linear to a Fused MLP
💻
OS
Content type:
Blog
huggingface.co
·
17h
17 hours ago
·
Hacker News
Actions for Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP
SoC FPGA advances wideband RF processing
🎮
GPU Architecture
edn.com
·
21h
21 hours ago
Actions for SoC FPGA advances wideband RF processing
WarpGuard
: Protected-Site Control-Flow Integrity for
CUDA
SASS Binaries
💻
OS
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries
Google unveils DiffusionGemma, delivering up to 4x faster inference on dedicated GPUs
💡
FlashAttention
alternativeto.net
·
5h
5 hours ago
Actions for Google unveils DiffusionGemma, delivering up to 4x faster inference on dedicated GPUs
KJLdefeated/RL.cu
: RLVR training for LLM in CUDA/C++
⚙
MLSys
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
NVIDIA
at
Computex
2026: RTX Spark Gaming Hands-On, DLSS 4.5, and More
💻
OS
techpowerup.com
·
4h
4 hours ago
Actions for NVIDIA at Computex 2026: RTX Spark Gaming Hands-On, DLSS 4.5, and More
Vortex expands open RISC-V graphics
🎮
GPU Architecture
jonpeddie.com
·
20h
20 hours ago
Actions for Vortex expands open RISC-V graphics
Vortex 3.0 Released As Full-Stack, Open-Source RISC-V
GPU
Now With 3D Pipeline
💻
OS
phoronix.com
·
2d
2 days ago
Actions for Vortex 3.0 Released As Full-Stack, Open-Source RISC-V GPU Now With 3D Pipeline
Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training
📦
TVM
Content type:
News
tomshardware.com
·
5d
5 days ago
·
Hacker News
Actions for Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training
DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off
💡
FlashAttention
androidauthority.com
·
11h
11 hours ago
Actions for DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off
Big Banks Eye New AI
Compute
Trading Market
📦
TVM
pymnts.com
·
2d
2 days ago
Actions for Big Banks Eye New AI Compute Trading Market
Google’s DiffusionGemma is 4x faster than its other Gemma models
💡
FlashAttention
thenewstack.io
·
1d
1 day ago
Actions for Google’s DiffusionGemma is 4x faster than its other Gemma models
NVIDIA
Accelerates Google DeepMind’s DiffusionGemma for Local AI
⚙
MLSys
Content type:
Blog
blogs.nvidia.com
·
1d
1 day ago
Actions for NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
Supermicro Stock Falls On Plans To Raise $7Bn In Capital
🐧
Kernel Dev
catenaa.com
·
21h
21 hours ago
·
Hacker News
Actions for Supermicro Stock Falls On Plans To Raise $7Bn In Capital
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help