🎨 LUT Compression - matmat

Release TorchCodec 0.14: HDR Video Decoding for CPU & CUDA, and Fast Wav Decoder · meta-pytorch/torchcodec

⚡Parallel Computing Code

github.com··Hacker News

AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference

⚡Parallel Computing Academic

arxiv.org·

NVIDIA Nsight Compute

⚡Parallel Computing

developer.nvidia.com·

Mosaic

🔓Open Source Software

flathub.org·

Core Automation co-founder Jerry Tworek jokes that Nvidia's CUDA translates to miracles in Polish

⚡Parallel Computing

digg.com·

New comment by bhvk08 in "Ask HN: Who wants to be hired? (June 2026)"

⚡Parallel Computing Discussion

news.ycombinator.com··Hacker News

Floor and Ceil Versus Denormals on CPU and GPU

📐Arithmetic Precision

asawicki.info·

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

⚡Parallel Computing

phoronix.com·

Less-relevant results

Nvidia RTX Spark: The $2,900 Floor Tells You Everything

⚡Parallel Computing Blog Discussion

tildalice.io·

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

🇨🇿Czech Computing Code

github.com··Hacker News

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

⚡Parallel Computing Academic

arxiv.org·

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

⚡Parallel Computing Academic

arxiv.org··Hacker News

xarray/osgverse: osgVerse, a complete 3d engine solution based on OpenSceneGraph. It supports OpenGL/OpenGLES/Vulkan/DirectX/Metal backends, and also works on modern browsers using WASM.

🕸️WebAssembly Code

github.com·

zed-industries/zed glsl-v0.2.4

🌳Incremental Parsing Code

github.com

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

⚡Parallel Computing Code

github.com··Hacker News

LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization

⚡Parallel Computing Academic

arxiv.org·

SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines

⚡Parallel Computing Academic

arxiv.org·

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

💻Operating System, OS Code

github.com··r/LocalLLaMA

CUDA-Oxide 0.2 Brings Early Improvements To Pure Rust CUDA Kernels

Framework Desktop AMD 395+ (rdna 3.5) cannot run confyui err Fix 2026

Release TorchCodec 0.14: HDR Video Decoding for CPU & CUDA, and Fast Wav Decoder · meta-pytorch/torchcodec

AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference

NVIDIA Nsight Compute

Mosaic

Core Automation co-founder Jerry Tworek jokes that Nvidia's CUDA translates to miracles in Polish

New comment by bhvk08 in "Ask HN: Who wants to be hired? (June 2026)"

Floor and Ceil Versus Denormals on CPU and GPU

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

Nvidia RTX Spark: The $2,900 Floor Tells You Everything

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

xarray/osgverse: osgVerse, a complete 3d engine solution based on OpenSceneGraph. It supports OpenGL/OpenGLES/Vulkan/DirectX/Metal backends, and also works on modern browsers using WASM.

zed-industries/zed glsl-v0.2.4

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization

SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.