Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔧 PTX
GPU Assembly, CUDA ISA, Kernel Optimization, Low-level Programming
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
121025
posts in
2.13
s
From Buffers to Registers: Unlocking Fine-Grained
FlashAttention
with
Hybrid-Bonded
3D NPU Co-Design
arxiv.org
·
9h
⚡
Flash Attention
htfab/microlane
: Self-contained RTL to GDS flow for simple chip designs
github.com
·
15h
·
Discuss:
Hacker News
✂️
CUTLASS
Zvec
: SQLite-like
simplicity
in an embedded vector database (By Alibaba)
zvec.org
·
1h
·
Discuss:
Hacker News
✂️
CUTLASS
AI
Inference
Needs A
Mix-And-Match
Memory Strategy
semiengineering.com
·
6h
🎯
Tensor Cores
Fine
Grained
Everything, and what comes after React Server
Components
blog.logrocket.com
·
1d
🔄
ONNX
Intel "Nova Lake" Compute
Tile
Die-sizes
Surface
techpowerup.com
·
1d
🔲
Loop Tiling
MSI GeForce RTX 5090 Lightning Z review – Lightning-fast and
thirsty
unicorn
in battle against NVIDIA’s clock speed barriers
igorslab.de
·
18m
📈
GPU Occupancy
Ph42oN
/
dxvk-gplasync
gitlab.com
·
16h
⏱️
CUDA Events
Hitting
1,000
tokens
per second on a single RTX 5090
blog.alpindale.net
·
3d
·
Discuss:
Hacker News
,
Hacker News
🎛️
CUDA Optimization
New AMD
Adrenalin
Driver
bluesnews.com
·
12h
🎮
NVIDIA
One Platform to Run Apps, Data, and AI
Anywhere
nutanix.com
·
18h
⚡
ONNX Runtime
CodeSOD
: Consistently
Transactional
thedailywtf.com
·
7h
🌳
Git Internals
How
Andrej
Karpathy
Built a Working Transformer in 243 Lines of Code
analyticsvidhya.com
·
1h
📜
TorchScript
Passing the
Torch
: Reflections on ARC’s Journey and the Future of
Specialized
Processing
eetimes.com
·
2d
⚡
Flash Attention
CPU
cloth
simulation performance
comparable
to GPU SotA
sig25ddmpd.github.io
·
13h
·
Discuss:
Hacker News
✂️
CUTLASS
Results from the
Advent
of
FPGA
Challenge
blog.janestreet.com
·
10h
·
Discuss:
Hacker News
🎯
Tensor Cores
AndPuQing/gflow
: A lightweight, single-node GPU job scheduler implemented in Rust.
github.com
·
1d
·
Discuss:
Hacker News
📊
CUDA Graphs
Show HN: Solving
Sudoku
reasoning via Energy
Geometric
models
davisgeometric.com
·
4h
·
Discuss:
Hacker News
✂️
CUTLASS
How
Anam
Achieved 250% Faster Inference Using
Zymtrace
Continuous GPU Profiling
zymtrace.com
·
3d
🔍
Nsight
How to connect
Convex
to
RunPod
for serverless GPU workloads
stack.convex.dev
·
2d
✂️
CUTLASS
Loading...
Loading more...
« Page 2
•
Page 4 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help