Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
GPU Programming
🖥️ GPU Programming
CUDA, Parallel Computing, Graphics APIs, Compute Shaders
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
42
posts in
14.1
ms
KJLdefeated/RL.cu
: RLVR training for LLM in CUDA/C++
⚡
Flash Attention
Content type:
Code
github.com
·
3d
3 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
AgentCompile: An LLM-Guided Compiler for Direct
CUDA
Inference
🤖
AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference
1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
💬
LLMs
smolhub.com
·
2d
2 days ago
·
r/LocalLLaMA
Actions for 1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
RenderLab – Prototype rendering techniques and renderers in the browser
✨
Computer Graphics
pub.prklinteractive.com
·
6d
6 days ago
·
Hacker News
Actions for RenderLab – Prototype rendering techniques and renderers in the browser
DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
💬
LLMs
Content type:
News
newsletter.semianalysis.com
·
1d
1 day ago
·
Hacker News
Actions for DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
Open
source building blocks for
computational
design. Est. 2006
💻
Programming Languages
thi.ng
·
3d
3 days ago
·
Hacker News
Actions for Open source building blocks for computational design. Est. 2006
Why Compiler Engineers Rarely Use Strassen's Algorithm for Fast Matrix Multiplications
⚡
Hardware Acceleration
Content type:
News
Content type:
Blog
leetarxiv.substack.com
·
2d
2 days ago
·
Substack
,
r/programming
Actions for Why Compiler Engineers Rarely Use Strassen's Algorithm for Fast Matrix Multiplications
Unsloth Gemma 4 QAT
⚡
Quantization
unsloth.ai
·
5d
5 days ago
Actions for Unsloth Gemma 4 QAT
NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure
✨
Computer Graphics
Content type:
Blog
blogs.nvidia.com
·
2d
2 days ago
·
Hacker News
Actions for NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure
nex-agi/Nex-N2-mini • Huggingface
🤖
AI
huggingface.co
·
6d
6 days ago
·
r/LocalLLaMA
Actions for nex-agi/Nex-N2-mini • Huggingface
APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM
Compute
Rebalancing
⚡
Hardware Acceleration
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
I stopped using most of Rust’s advanced features for my ML library
🤖
AI
Content type:
Code
github.com
·
2d
2 days ago
·
r/rust
Actions for I stopped using most of Rust’s advanced features for my ML library
Unpacking AI: The Hardware Behind AI
🤖
AI
Content type:
News
pathtostaff.com
·
4d
4 days ago
·
Hacker News
Actions for Unpacking AI: The Hardware Behind AI
ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity
GPU
and Infrastructure Modeling
✨
Computer Graphics
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling
sgl-project/sglang-omni: SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models
⚡
Hardware Acceleration
Content type:
Code
github.com
·
20h
20 hours ago
Actions for sgl-project/sglang-omni: SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models
AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
🤖
AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
·
Hacker News
Actions for AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2
⚡
Hardware Acceleration
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2
maziyarpanahi/openmed:
open-source
healthcare ai
🤖
AI
Content type:
Code
github.com
·
20h
20 hours ago
Actions for maziyarpanahi/openmed: open-source healthcare ai
LLM-Based Porting of Optimized C++ to
CUDA
Through Deoptimization and Reoptimization
💬
LLMs
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization
SET:
Stream-Event-Triggered
Scheduling for Efficient
CUDA
Graph
Pipelines
⚡
Hardware Acceleration
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help