⚡ CUDA - moyutianzun

🔺Triton Blog

blog.adafruit.com·

WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

🔺Triton Academic

arxiv.org·

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

💾KV Cache Code

github.com··Hacker News

NVIDIA Nsight Compute

🔺Triton

developer.nvidia.com·

Exploiting GPU Tensor Cores from Java using Babylon

🔺Triton

inside.java·

Less-relevant results

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

🔄Transformers Blog

blogs.nvidia.com·

Proton Experimental gets fixes for Path of Exile 1 & 2, Guild Wars 2, Call of Duty (2003), Exanima and more

🔺Triton News

gamingonlinux.com·

Framework Desktop AMD 395+ (rdna 3.5) cannot run confyui err Fix 2026

🔺Triton Blog

runaihome.com··DEV

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

⚡Inference Optimization Blog

dnhkng.github.io·

Apple expands Private Cloud Compute to Google Cloud and NVIDIA hardware

🔲TPU Architecture

4sysops.com·

NVIDIA chip powers local AI workloads

🤖agentic system

edn.com·

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

⚡Inference Optimization Blog

jimmysong.io·

Core Automation co-founder Jerry Tworek jokes that Nvidia's CUDA translates to miracles in Polish

🔺Triton

digg.com·

Google-SpaceX $30B Compute Deal Raises Cloud Buyer Questions

🔲TPU Architecture

techrepublic.com·

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

💾KV Cache

phoronix.com··r/artificial

Nvidia RTX Spark: The $2,900 Floor Tells You Everything

🤖agentic system Blog Discussion

tildalice.io·

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

🔺Triton

everylocalai.com··DEV

Apple WWDC On-Device AI Deep Dive - Google Docs

🎛️Fine-Tuning

gist.is··Hacker News

Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]

CUDA-Oxide 0.2 Brings Early Improvements To Pure Rust CUDA Kernels

GPUsnek is Python on nVidia’s CUDA

WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

NVIDIA Nsight Compute

Exploiting GPU Tensor Cores from Java using Babylon

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

Proton Experimental gets fixes for Path of Exile 1 & 2, Guild Wars 2, Call of Duty (2003), Exanima and more

Framework Desktop AMD 395+ (rdna 3.5) cannot run confyui err Fix 2026

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

Apple expands Private Cloud Compute to Google Cloud and NVIDIA hardware

NVIDIA chip powers local AI workloads

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

Core Automation co-founder Jerry Tworek jokes that Nvidia's CUDA translates to miracles in Polish

Google-SpaceX $30B Compute Deal Raises Cloud Buyer Questions

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

Nvidia RTX Spark: The $2,900 Floor Tells You Everything

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

Apple WWDC On-Device AI Deep Dive - Google Docs