🧮 Parallel Prefix Scan - surajkadapa · Scour

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

🔢Tensor Cores Code

github.com··Hacker News

WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

⚡Hardware Acceleration Academic

Less-relevant results

Training Cycle Halved: LoongForge End-to-End Optimization for GR00T N1.6 Delivers 2.3× Throughput

🖥️GPU Computing

baidu-baige.github.io··Hacker News

Nvidia GeForce RTX 2080 Ti Super prototype shows what could have been, with 4,608 CUDA cores

🖥️GPU Computing

GPUsnek is Python on nVidia’s CUDA

🖥️GPU Computing Blog

blog.adafruit.com·

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

⚡Hardware Acceleration

phoronix.com··r/artificial·Cited by 1 article

Polars GPU engine — cudf 26.06.01 documentation

🖥️GPU Computing Reference

docs.rapids.ai··Hacker News

Framework Desktop AMD 395+ (rdna 3.5) cannot run confyui err Fix 2026

⚡Hardware Acceleration Blog

runaihome.com··DEV

RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8

🥾Bootloaders Blog

imil.net··Hacker News, r/LocalLLaMA·Cited by 2 articles

Gerrymandering the Warp: Non-Control-Data Attacks on CUDA Collective Decision

⚡Hardware Acceleration Academic

Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]

🔢Tensor Cores

openjdk.org··Lobsters, r/java

How to fit Qwen 3.6 35B A3B into 16GB of VRAM, & run it with Llama.cpp on an RTX 3080

⚡Hardware Acceleration

autodidacts.io·

Redditor buys RTX 2080 Ti Super engineering sample on eBay, has the same number of cores as an RTX Titan but half the VRAM

🖥️GPU Computing News

tweaktown.com·

Nvidia’s RTX Spark to fuel Adobe creative apps

🖥️GPU Computing

jonpeddie.com·

NVIDIA RTX Pro 6000 Blackwell: 96GB GDDR7 and the End of VRAM Anxiety

🖥️GPU Computing Blog

fitservers.com·

Making FlashAttention-4 faster for inference

🔢Tensor Cores Blog

modal.com··Hacker News

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

🖥️GPU Computing Blog

dnhkng.github.io·

hasktorch/hasktorch: Tensors and neural networks in Haskell

🤖AI Code

nomp: A Framework for Building Domain Specific Compilers

⚡Hardware Acceleration Academic

Flatpak 1.18 adds AMD ROCm support, improved error output, and faster Fish shell start-up

⚡Hardware Acceleration

alternativeto.net·

Log in to enable infinite scrolling