🎯 Tensor Cores - miterion

⚡Cuda Academic

arxiv.org·

Less-relevant results

NVIDIA chip powers local AI workloads

🎮NVIDIA

edn.com·

The Dragonbond by Abyss Connection

⚡CUDA Programming Patterns

pouet.net·

Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon

🔢cuBLAS Blog

tridao.me··Hacker News

Vortex expands open RISC-V graphics

⚡Cuda

jonpeddie.com·

Vortex 3.0 Released As Full-Stack, Open-Source RISC-V GPU Now With 3D Pipeline

⚡Cuda

phoronix.com·

GPU Servers for Best Performance

🔥PyTorch

leaseweb.com··DEV

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

🔢cuBLAS Code

github.com··Hacker News

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

🎮NVIDIA Blog

blogs.nvidia.com·

Build a local voice agent with Red Hat OpenShift AI

🔥PyTorch

developers.redhat.com·

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

🔥PyTorch Blog

jimmysong.io·

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

🔥PyTorch Academic

arxiv.org·

The Real Fix for Surprise Billing Requires Both Sides to Give

🐕Ruff

aei.org·

AMD Radeon RX 9070 GRE vs. Nvidia GeForce RTX 5070

🔥PyTorch

club386.com·

NVIDIA's RTX 5060 May Finally Get The VRAM Upgrade Gamers Wanted

⚡Cuda News

hothardware.com·

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

⏱️Benchmarking

smolhub.com··r/LocalLLaMA

Show HN: One-Shot Program Generation Through Direct Memory Diffusion

🔥PyTorch Code

github.com··Hacker News

Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

Exploiting GPU Tensor Cores from Java using Babylon

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

NVIDIA chip powers local AI workloads

The Dragonbond by Abyss Connection

Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon

Vortex expands open RISC-V graphics

Vortex 3.0 Released As Full-Stack, Open-Source RISC-V GPU Now With 3D Pipeline

GPU Servers for Best Performance

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

Build a local voice agent with Red Hat OpenShift AI

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

The Real Fix for Surprise Billing Requires Both Sides to Give

AMD Radeon RX 9070 GRE vs. Nvidia GeForce RTX 5070

NVIDIA's RTX 5060 May Finally Get The VRAM Upgrade Gamers Wanted

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

Show HN: One-Shot Program Generation Through Direct Memory Diffusion