Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
CUDA
🟩 CUDA
Specific
GPU Kernels, Parallel Computing, NVIDIA, PTX
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
52
posts in
12.6
ms
CUDA-Oxide
0.2 Brings Early Improvements To Pure Rust
CUDA
Kernels
💻
OS
phoronix.com
·
6d
6 days ago
Actions for CUDA-Oxide 0.2 Brings Early Improvements To Pure Rust CUDA Kernels
GPUsnek is Python on
nVidia
’s
CUDA
💻
OS
Content type:
Blog
blog.adafruit.com
·
1d
1 day ago
Actions for GPUsnek is Python on nVidia’s CUDA
WarpGuard
: Protected-Site Control-Flow Integrity for
CUDA
SASS Binaries
💻
OS
Content type:
Academic
arxiv.org
·
16h
16 hours ago
Actions for WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries
Less-relevant results
First Steps Toward Automated AI Research
💻
OS
recursive.com
·
5h
5 hours ago
·
Hacker News
Actions for First Steps Toward Automated AI Research
RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting
CUDA
megakernel and self-tunes it past
cuBLAS
at batch-1 LLM decode.
💻
OS
Content type:
Code
github.com
·
3d
3 days ago
·
Hacker News
Actions for RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.
Exploiting
GPU
Tensor Cores from Java using Babylon [Juan Fumero]
🎮
GPU Architecture
openjdk.org
·
2d
2 days ago
·
Lobsters
,
r/java
Actions for Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]
Profiling in PyTorch (
Part
2): From Nn.Linear to a Fused MLP
💻
OS
Content type:
Blog
huggingface.co
·
20h
20 hours ago
·
Hacker News
Actions for Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP
Making FlashAttention-4 faster for inference
💻
OS
Content type:
Blog
modal.com
·
8h
8 hours ago
Actions for Making FlashAttention-4 faster for inference
SoC FPGA advances wideband RF processing
🎮
GPU Architecture
edn.com
·
1d
1 day ago
Actions for SoC FPGA advances wideband RF processing
Vortex expands open RISC-V graphics
🎮
GPU Architecture
jonpeddie.com
·
23h
23 hours ago
Actions for Vortex expands open RISC-V graphics
KJLdefeated/RL.cu
: RLVR training for LLM in CUDA/C++
⚙
MLSys
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
💻
OS
Content type:
Academic
arxiv.org
·
2d
2 days ago
·
Hacker News
Actions for AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
Google unveils DiffusionGemma, delivering up to 4x faster inference on dedicated GPUs
💡
FlashAttention
alternativeto.net
·
8h
8 hours ago
Actions for Google unveils DiffusionGemma, delivering up to 4x faster inference on dedicated GPUs
Vortex 3.0 Released As Full-Stack, Open-Source RISC-V
GPU
Now With 3D Pipeline
💻
OS
phoronix.com
·
2d
2 days ago
Actions for Vortex 3.0 Released As Full-Stack, Open-Source RISC-V GPU Now With 3D Pipeline
NVIDIA
at
Computex
2026: RTX Spark Gaming Hands-On, DLSS 4.5, and More
💻
OS
techpowerup.com
·
7h
7 hours ago
Actions for NVIDIA at Computex 2026: RTX Spark Gaming Hands-On, DLSS 4.5, and More
Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training
📦
TVM
Content type:
News
tomshardware.com
·
5d
5 days ago
·
Hacker News
Actions for Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training
DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off
💡
FlashAttention
androidauthority.com
·
14h
14 hours ago
Actions for DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off
Big Banks Eye New AI
Compute
Trading Market
📦
TVM
pymnts.com
·
2d
2 days ago
Actions for Big Banks Eye New AI Compute Trading Market
Google's new open-weights model brings image-generation tricks to AI text generation
⚙
MLSys
Content type:
News
theregister.com
·
2h
2 hours ago
Actions for Google's new open-weights model brings image-generation tricks to AI text generation
Google’s DiffusionGemma is 4x faster than its other Gemma models
💡
FlashAttention
thenewstack.io
·
1d
1 day ago
Actions for Google’s DiffusionGemma is 4x faster than its other Gemma models
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help