Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
⚡ Cuda
Specific
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
146580
posts in
21.0
ms
GTaP
: A GPU-Resident Fork-Join Task-Parallel Runtime with a
Pragma-Based
Interface
🌊
CUDA Streams
arxiv.org
·
1d
Save 4× GPU Memory With One Line of Python:
TurboQuant
+
HuggingFace
🏎️
TensorRT
medium.com
·
3d
Run it for yourself:
compute
time
dilation
✂️
CUTLASS
github.com
·
5h
·
Hacker News
PiTorch
: ML on
Baremetal
Raspberry Pis
📜
TorchScript
masonjwang.com
·
23h
·
Hacker News
Floating
point from
scratch
: Hard Mode
🔄
SIMD Programming
news.ycombinator.com
·
1d
·
Hacker News
How do you
compute
?
✂️
CUTLASS
thatalexguy.dev
·
4d
Beyond
torch.softmax
: Building a Custom Memory-Efficient
Softmax
Kernel from Scratch
🎯
Tensor Cores
medium.com
·
2d
GPU Memory
Math
for LLMs: 2026 Edition
📈
GPU Occupancy
medium.com
·
5d
Breaking Down the
Cerebras
Wafer
Scale Engine
⚡
CUDA Programming Patterns
wafer.substack.com
·
5d
·
Substack
Simple,
reactive
programming environment for the
Julia
Language
📜
TorchScript
plutojl.org
·
6d
·
Hacker News
Foundry: Template-Based
CUDA
Graph Context
Materialization
for Fast LLM Serving Cold Start
🌊
CUDA Streams
arxiv.org
·
15h
Dimforge
Q1 2026 technical report − GPU compute with khal,
vortx
, inferi
🎯
GPU Kernels
dimforge.com
·
4d
·
r/rust
PsyChip/VEC
: Dead simple GPU-resident vector database
✂️
CUTLASS
github.com
·
1d
·
Hacker News
No need to
purchase
a high-end GPU machine to run local LLMs with massive
context
.
🔗
NCCL
medium.com
·
6d
A WIP arbitrary precision
arithmetic
library (alternative to
GMP
)
🎯
Tensor Cores
news.ycombinator.com
·
2d
·
Hacker News
Making Room for AI: Multi-GPU Molecular Dynamics with Deep
Potentials
in
GROMACS
🔗
NCCL
arxiv.org
·
15h
MysticCodingCat/CUDA-Native-HUBO
: A GPU-native solver for 3-way combinatorial optimization (HUBO). Achieving
digital-annealer-level
performance on a single RTX 3060 Ti
⚡
CUDA Programming Patterns
github.com
·
14h
·
Hacker News
Splats
under Pressure: Exploring Performance-Energy Trade-offs in Real-Time 3D Gaussian Splatting under Constrained GPU
Budgets
🌊
CUDA Streams
arxiv.org
·
15h
Gemma4.java
: Run Gemma 4 in pure Java (no Python, no
JNI
)
✂️
CUTLASS
github.com
·
2d
·
Hacker News
NEURA
: A Unified and
Retargetable
Compilation Framework for Coarse-Grained Reconfigurable Architectures
✂️
CUTLASS
arxiv.org
·
2d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help