🟢 CUDA - nayyara.airlangga

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

🔢GEMM Optimization Code

github.com··Hacker News

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

🎮GPU Computing Academic

arxiv.org·

NVIDIA Nsight Compute

🎮GPU Computing

developer.nvidia.com·

NVIDIA Confidential Computing to Help Expand Apple’s Private Cloud Compute

🎮GPU Computing Blog

blogs.nvidia.com·

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

🧠Inference Engineering

alternativeto.net·

Less-relevant results

Apple expands Private Cloud Compute to Google Cloud and NVIDIA hardware

⚗️Kernel Fusion

4sysops.com·

Nvidia RTX Spark: The $2,900 Floor Tells You Everything

🎮GPU Computing Blog Discussion

tildalice.io·

Google Pays SpaceX $920M/Month for AI Compute (4 minute read)

💰Inference Cost

winbuzzer.com·

NVIDIA chip powers local AI workloads

🎮GPU Computing

edn.com·

Google-SpaceX $30B Compute Deal Raises Cloud Buyer Questions

☁️Cloud Infrastructure

techrepublic.com·

how to make brave use nvidia gpu on ubuntu?

⚗️Kernel Fusion

lemmy.ml·

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

💰Inference Cost Blog

jimmysong.io·

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

💾KV Cache Code

github.com··Hacker News

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

🎮GPU Computing Academic

arxiv.org··Hacker News

Particle: SpaceX Discloses $920 Million‑a‑Month Google Compute Deal Ahead of IPO

⚗️Kernel Fusion News

particle.news·

Apple extends Private Cloud Compute to third-party data centers

☁️Cloud Infrastructure

helpnetsecurity.com·

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

⏱️Prefill Decoding

smolhub.com··r/LocalLLaMA

Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]

CUDA-Oxide 0.2 Brings Early Improvements To Pure Rust CUDA Kernels

Exploiting GPU Tensor Cores from Java using Babylon

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

NVIDIA Nsight Compute

NVIDIA Confidential Computing to Help Expand Apple’s Private Cloud Compute

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

Apple expands Private Cloud Compute to Google Cloud and NVIDIA hardware

Nvidia RTX Spark: The $2,900 Floor Tells You Everything

Google Pays SpaceX $920M/Month for AI Compute (4 minute read)

NVIDIA chip powers local AI workloads

Google-SpaceX $30B Compute Deal Raises Cloud Buyer Questions

how to make brave use nvidia gpu on ubuntu?

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

Particle: SpaceX Discloses $920 Million‑a‑Month Google Compute Deal Ahead of IPO

Apple extends Private Cloud Compute to third-party data centers

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM