🎮 SIMT Execution - hello

Discussed on Hacker News

⚡Hardware Acceleration i-programmer.info·

Lemonade SDK Adds Nvidia CUDA Support

Covers Show HN: Lemonade: Run LLMs Locally with GPU and NPU Acceleration

🎨Chroma Towards Data Science·

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

⚡Hardware Acceleration Carscoops·

Got A Bugatti W16 Lying Around? A Designer Has The Perfect ‘Cuda For It

Less-relevant results

🌟Ray Tracing research.colfax-intl.com·

NVFP4 Blockscaled GEMM on NVIDIA RTX Pro Blackwell GPUs (SM12x)

Discussed on Hacker News

⚡Hardware Acceleration developer.nvidia.com·

Boosting MoE Training Throughput with Advanced Fusion Kernels

🔧LLVM IR Optimization hiraditya.github.io·

Loop Unrolling in the ML Era

Discussed on Hacker News

💬Prompt Engineering DEV Community·

llama-bench skipped FA on capable GPUs — b9437 corrects it

Covers 2 stories including GitHub here . You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inferen...

Discussed on DEV

⚡Hardware Acceleration GitHub·

I got tired of not understanding how vLLM works under the hood, so I built my own mini inference engine from scratch.

Discussed on r/LLM

🔬Deep Learning GitHub·

GPU Puzzles (2021)

Discussed on Hacker News

⚡Hardware Acceleration DEV Community·

Registers, Lanes, and Berry Phase: Lifting Siunertaq from Batch Graphs to the Complex Plane

Discussed on DEV

⚛️Quantum Computing arxiv.org·

Diagonal-Budgeted Trotterization for Efficient Quantum Hamiltonian Simulation

🔬Deep Learning GitHub·

open-source Jarvis project

Discussed on r/LLM

⚡Hardware Acceleration GitHub·

Running a 35B MoE model on a 2017 AMD RX 580 8GB via Vulkan (no ROCm/CUDA)

Discussed on Hacker News

⚡Hardware Acceleration GitHub·

Show HN: cuTile Rust: Safe, data-race-free GPU kernels in Rust

Covers 2 stories including AlterLang InterCode: A Native Intercomprehension Paradigm in Programming, Powered by GuruDev

Covered by indiehacker.news

Discussed on Hacker News and DEV

🔬Deep Learning GitHub

pytorch/executorch ciflow/cuda/20288

🎨Chroma GitHub·

Pipeline-parallel LLM inference across GPUs on separate machines

Discussed on Hacker News

🔬Deep Learning GitHub

pytorch/executorch ciflow/cuda/20384

No more posts from hello's subscribed feeds.

Scour all 25,324 feeds Learn more about Feeds

From Tokens to Regions: CUDA-Sensitive Instruction Tuning for GPU Kernel Generation

The Most Important Nvidia Product Isn't a Chip. It's This.

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

Lemonade SDK Adds Nvidia CUDA Support

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Got A Bugatti W16 Lying Around? A Designer Has The Perfect ‘Cuda For It

NVFP4 Blockscaled GEMM on NVIDIA RTX Pro Blackwell GPUs (SM12x)

Boosting MoE Training Throughput with Advanced Fusion Kernels

Loop Unrolling in the ML Era

llama-bench skipped FA on capable GPUs — b9437 corrects it

I got tired of not understanding how vLLM works under the hood, so I built my own mini inference engine from scratch.

GPU Puzzles (2021)

Registers, Lanes, and Berry Phase: Lifting Siunertaq from Batch Graphs to the Complex Plane

Diagonal-Budgeted Trotterization for Efficient Quantum Hamiltonian Simulation

open-source Jarvis project

Running a 35B MoE model on a 2017 AMD RX 580 8GB via Vulkan (no ROCm/CUDA)

Show HN: cuTile Rust: Safe, data-race-free GPU kernels in Rust

pytorch/executorch ciflow/cuda/20288

Pipeline-parallel LLM inference across GPUs on separate machines

pytorch/executorch ciflow/cuda/20384