Parallel Prefix Scan

Feeds to Scour
SubscribedAll
Scoured 149 posts in 10.9 ms

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

 🔢Tensor Cores  Content type: Code
github.com··Hacker News

WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

 Hardware Acceleration  Content type: Academic
arxiv.org·
Less-relevant results

Training Cycle Halved: LoongForge End-to-End Optimization for GR00T N1.6 Delivers 2.3× Throughput

 🖥️GPU Computing

Nvidia GeForce RTX 2080 Ti Super prototype shows what could have been, with 4,608 CUDA cores

 🖥️GPU Computing
club386.com·

GPUsnek is Python on nVidia’s CUDA

 🖥️GPU Computing  Content type: Blog
blog.adafruit.com·

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

 Hardware Acceleration

Polars GPU engine — cudf 26.06.01 documentation

 🖥️GPU Computing  Content type: Reference

Framework Desktop AMD 395+ (rdna 3.5) cannot run confyui err Fix 2026

 Hardware Acceleration  Content type: Blog
runaihome.com··DEV

RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8

 🥾Bootloaders  Content type: Blog

Gerrymandering the Warp: Non-Control-Data Attacks on CUDA Collective Decision

 Hardware Acceleration  Content type: Academic
arxiv.org·

Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]

 🔢Tensor Cores
openjdk.org··Lobsters, r/java

How to fit Qwen 3.6 35B A3B into 16GB of VRAM, & run it with Llama.cpp on an RTX 3080

 Hardware Acceleration
autodidacts.io·

Redditor buys RTX 2080 Ti Super engineering sample on eBay, has the same number of cores as an RTX Titan but half the VRAM

 🖥️GPU Computing  Content type: News
tweaktown.com·

Nvidia’s RTX Spark to fuel Adobe creative apps

 🖥️GPU Computing
jonpeddie.com·

NVIDIA RTX Pro 6000 Blackwell: 96GB GDDR7 and the End of VRAM Anxiety

 🖥️GPU Computing  Content type: Blog
fitservers.com·

Making FlashAttention-4 faster for inference

 🔢Tensor Cores  Content type: Blog
modal.com··Hacker News

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

 🖥️GPU Computing  Content type: Blog
dnhkng.github.io·

hasktorch/hasktorch: Tensors and neural networks in Haskell

 🤖AI  Content type: Code
github.com·

nomp: A Framework for Building Domain Specific Compilers

 Hardware Acceleration  Content type: Academic
arxiv.org·

Flatpak 1.18 adds AMD ROCm support, improved error output, and faster Fish shell start-up

 Hardware Acceleration
alternativeto.net·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help