GPU Programming

Feeds to Scour
SubscribedAll
Scoured 42 posts in 14.1 ms

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

 Flash Attention  Content type: Code
github.com··Hacker News

AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference

 🤖AI  Content type: Academic
arxiv.org·

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

 💬LLMs
smolhub.com··r/LocalLLaMA

RenderLab – Prototype rendering techniques and renderers in the browser

 Computer Graphics

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

 💬LLMs  Content type: News

Open source building blocks for computational design. Est. 2006

 💻Programming Languages
thi.ng··Hacker News

Why Compiler Engineers Rarely Use Strassen's Algorithm for Fast Matrix Multiplications

 Hardware Acceleration  Content type: News  Content type: Blog

Unsloth Gemma 4 QAT

 Quantization
unsloth.ai·

NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure

 Computer Graphics  Content type: Blog

nex-agi/Nex-N2-mini • Huggingface

 🤖AI

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

 Hardware Acceleration  Content type: Academic
arxiv.org·

I stopped using most of Rust’s advanced features for my ML library

 🤖AI  Content type: Code
github.com··r/rust

Unpacking AI: The Hardware Behind AI

 🤖AI  Content type: News

ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

 Computer Graphics  Content type: Academic
arxiv.org·

sgl-project/sglang-omni: SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models

 Hardware Acceleration  Content type: Code
github.com·

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

 🤖AI  Content type: Academic
arxiv.org··Hacker News

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

 Hardware Acceleration  Content type: Academic
arxiv.org·

maziyarpanahi/openmed: open-source healthcare ai

 🤖AI  Content type: Code
github.com·

LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization

 💬LLMs  Content type: Academic
arxiv.org·

SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines

 Hardware Acceleration  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help