Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎛️ CUDA Optimization
Kernel Tuning, Memory Access Patterns, Thread Configuration
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
82630
posts in
423.0
ms
Anthropic
's Performance Take-Home: A 65x Optimization (For
Dummies
)
ikot.blog
·
1d
·
Discuss:
Hacker News
📈
Occupancy Optimization
The Linux
graphics
stack in a
nutshell
, part 1
lwn.net
·
6h
·
Discuss:
Hacker News
🔧
PTX
Design of a GPU with
Heterogeneous
Cores
for Graphics
arxiv.org
·
2d
🎯
GPU Kernels
Show HN: C discrete event SIM w
stackful
coroutines runs 45x faster than
SimPy
github.com
·
23h
·
Discuss:
Hacker News
⏱️
CUDA Events
Using
Nsight
Compute with large
codebases
- Part 2 : Profiling large code bases
blog.ncompass.tech
·
22h
·
Discuss:
Hacker News
🔍
Nsight
WebGPU
Compute
Shaders
webgpufundamentals.org
·
7h
🎮
NVIDIA
Hetccl
Shows Scaling Of Multi-Vendor GPU
Clusters
For Large Language Models
quantumzeitgeist.com
·
14h
🔗
NCCL
WritePolicyBench
: Benchmarking Memory Write Policies under
Byte
Budgets
arxiv.org
·
10h
🔲
Loop Tiling
Optimized
LLM Inference
Engines
rishirajacharya.com
·
32m
⚡
ONNX Runtime
Engineering
Ethereum
's Speed: How we made
Ethrex
20x faster
blog.lambdaclass.com
·
14m
⏱️
Benchmarking
slow
abstraction
steel-water.bearblog.dev
·
8h
🐕
Ruff
A
faster
CPU
won’t fix your next PC upgrade
xda-developers.com
·
19h
📈
Occupancy Optimization
AMD Intros
Kintex
UltraScale
+ Gen 2 FPGAs
servethehome.com
·
8h
🔧
PTX
Diffusion LLM Sampling Achieves 70%
Latency
Reduction With Novel
NPU
Design
quantumzeitgeist.com
·
2d
🎯
Tensor Cores
ML for Energy-Performance-Aware Scheduling On Heterogeneous
Multicore
Architectures (
Cambridge
)
semiengineering.com
·
1d
📈
Occupancy Optimization
Scaling
Video
Encoding
with Edge AI Power
dev.to
·
10h
·
Discuss:
DEV
⚡
Flash Attention
Go Deep Dive:
Mutex
vs
RWMutex
dev.to
·
2h
·
Discuss:
DEV
⚡
CUDA Programming Patterns
The
Heartbeat
of Tetris 🟥🟥🟥🟥: What a
1x1
Pixel Taught Me About Concurrency
qianarthurwang.substack.com
·
22h
·
Discuss:
r/programming
⚡
CUDA Programming Patterns
Confidential
Computing Adds a Crazy Amount of
Overhead
to GPUs
bomfather.dev
·
29m
·
Discuss:
Hacker News
📈
GPU Occupancy
The
SPECviewperf
benchmark reaches
milestone
jonpeddie.com
·
21h
🔍
Nsight
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help