Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
GPU
⚡ GPU
cuda,triton
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
36
posts in
18.3
ms
Exploiting
GPU
Tensor
Cores
from Java using Babylon [Juan Fumero]
🤖
llm
openjdk.org
·
5d
5 days ago
·
Lobsters
,
r/java
Actions for Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]
AmrDeveloper/Turtle: A Heterogeneous Pythonic 🐍 language to practice targeting CPU &
GPU
in the same
program
on Mobile Devices Influenced by Python, Mojo and
CUDA
🐍
Python
Content type:
Code
github.com
·
2d
2 days ago
·
Hacker News
Actions for AmrDeveloper/Turtle: A Heterogeneous Pythonic 🐍 language to practice targeting CPU & GPU in the same program on Mobile Devices Influenced by Python, Mojo and CUDA
Training Cycle Halved: LoongForge End-to-End Optimization for GR00T N1.6 Delivers 2.3× Throughput
🔱
Triton
baidu-baige.github.io
·
20h
20 hours ago
·
Hacker News
Actions for Training Cycle Halved: LoongForge End-to-End Optimization for GR00T N1.6 Delivers 2.3× Throughput
Polars
GPU
engine —
cudf
26.06.01 documentation
🔱
Triton
Content type:
Reference
docs.rapids.ai
·
2d
2 days ago
·
Hacker News
Actions for Polars GPU engine — cudf 26.06.01 documentation
AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
🚀
CUDA Kernels
Content type:
Academic
arxiv.org
·
5d
5 days ago
·
Hacker News
,
Hacker News
Actions for AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8
🤖
llm
Content type:
Blog
imil.net
·
1d
1 day ago
·
Hacker News
,
r/LocalLLaMA
·
Cited by 2 articles
Actions for RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8
Making FlashAttention-4 faster for inference
🚀
CUDA Kernels
Content type:
Blog
modal.com
·
3d
3 days ago
·
Hacker News
,
Hacker News
Actions for Making FlashAttention-4 faster for inference
1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
🤖
llm
smolhub.com
·
6d
6 days ago
·
r/LocalLLaMA
Actions for 1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
Orchestrate your LLM pipeline. Locally
🤖
llm
llmforge.app
·
2d
2 days ago
·
Hacker News
Actions for Orchestrate your LLM pipeline. Locally
CommBench: Can LLMs Write
Correct
and Efficient
GPU
Communication Code?
🤖
llm
uccl-project.github.io
·
3d
3 days ago
·
Hacker News
Actions for CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?
DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
🤖
llm
Content type:
News
newsletter.semianalysis.com
·
5d
5 days ago
·
Hacker News
·
Cited by 1 article
Actions for DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
4× RTX
Pro
6000 Blackwell on Water, and the One Card That Wouldn't Behave
🤖
llm
Content type:
Blog
sabareesh.com
·
2d
2 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for 4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave
Personal AI for Research, Voice, and Everyday Tasks
🤖
llm
whissle.ai
·
1d
1 day ago
·
Hacker News
Actions for Personal AI for Research, Voice, and Everyday Tasks
Less-relevant results
Local models in mid-2026: the engineering that closed the gap
🤖
llm
coles.codes
·
2d
2 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for Local models in mid-2026: the engineering that closed the gap
Agentic Memory Management for
GPU
Code Generation
🤖
llm
Content type:
Blog
ucbskyadrs.github.io
·
3d
3 days ago
·
Hacker News
Actions for Agentic Memory Management for GPU Code Generation
Anyone been using
CUDA
13.3 for the past week or 2?
🤖
llm
Content type:
Code
github.com
·
1d
1 day ago
·
r/LocalLLaMA
Actions for Anyone been using CUDA 13.3 for the past week or 2?
Mojo Nightly
🦀
Rust
Content type:
Blog
mojolang.org
·
3d
3 days ago
·
Hacker News
Actions for Mojo Nightly
Why Compiler Engineers Rarely Use Strassen's Algorithm for Fast Matrix Multiplications
🦀
Rust
Content type:
News
Content type:
Blog
leetarxiv.substack.com
·
5d
5 days ago
·
Substack
,
r/programming
Actions for Why Compiler Engineers Rarely Use Strassen's Algorithm for Fast Matrix Multiplications
Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and
Co-Design
📶
Beamforming
Content type:
Blog
tilert.ai
·
5d
5 days ago
·
Hacker News
·
Cited by 2 articles
Actions for Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design
Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP
🐍
Python
Content type:
Blog
huggingface.co
·
3d
3 days ago
·
Hacker News
·
Cited by 1 article
Actions for Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help