Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔧 PTX
GPU Assembly, CUDA ISA, Kernel Optimization, Low-level Programming
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
121099
posts in
1.55
s
Ph42oN
/
dxvk-gplasync
gitlab.com
·
12h
⏱️
CUDA Events
Hitting
1,000
tokens
per second on a single RTX 5090
blog.alpindale.net
·
3d
·
Discuss:
Hacker News
,
Hacker News
🎛️
CUDA Optimization
One Platform to Run Apps, Data, and AI
Anywhere
nutanix.com
·
14h
⚡
ONNX Runtime
New AMD
Adrenalin
Driver
bluesnews.com
·
9h
🎮
NVIDIA
Running my
kernel
on real
hardware
kamkow1lair.pl
·
2d
·
Discuss:
Hacker News
🏗️
Build Systems
Passing the
Torch
: Reflections on ARC’s Journey and the Future of
Specialized
Processing
eetimes.com
·
2d
⚡
Flash Attention
Results from the
Advent
of
FPGA
Challenge
blog.janestreet.com
·
6h
·
Discuss:
Hacker News
🎯
Tensor Cores
CPU
cloth
simulation performance
comparable
to GPU SotA
sig25ddmpd.github.io
·
10h
·
Discuss:
Hacker News
✂️
CUTLASS
From 34% to 96%: The
Porting
Initiative
Delivers
hologram.page
·
10h
·
Discuss:
Hacker News
🔄
ONNX
Show HN: Solving
Sudoku
reasoning via Energy
Geometric
models
davisgeometric.com
·
1h
·
Discuss:
Hacker News
✂️
CUTLASS
OSDev
Bare Bones with Rust - Cross-Compilation and
Freestanding
dev.to
·
1d
·
Discuss:
DEV
🏗️
Build Systems
How to connect
Convex
to
RunPod
for serverless GPU workloads
stack.convex.dev
·
2d
✂️
CUTLASS
An
async
HTTP server in ~80 lines of modern C++ (
coroutines
)
vixcpp.com
·
2h
·
Discuss:
Hacker News
⚙️
JIT Compilation
Rewrote
my Node.js data generator in Rust. 20x faster, but the 15MB binary (vs 500MB node_
modules
) is the real win.
algomimic.com
·
1d
·
Discuss:
r/rust
📊
Profiling Tools
ALPHA-PIM
: Analysis of Linear
Algebraic
Processing for High-Performance Graph Applications on a Real Processing-In-Memory System
arxiv.org
·
1d
🔢
cuBLAS
Parallel Track Transformers:
Enabling
Fast GPU Inference with Reduced
Synchronization
machinelearning.apple.com
·
2d
⏱️
CUDA Events
Minimum
Energy Per
Query
semiengineering.com
·
2h
📈
Occupancy Optimization
Mesa
26.0 Released With
RADV
Ray Tracing Performance Gains
linuxiac.com
·
14h
🔍
Nsight
Pi.dev
: There are many coding agents, but this one is
mine
pi.dev
·
3h
·
Discuss:
Hacker News
🤖
AI Coding Tools
[News] SK
hynix
Unveils AI Chip Architecture with
HBF
, Reportedly Boosts Performance per Watt by Up to 2.69×
trendforce.com
·
8h
·
Discuss:
r/hardware
⚡
Flash Attention
Loading...
Loading more...
« Page 2
•
Page 4 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help