🖥️ GPU Programming - jhcha.oyo

🤖AI Academic

arxiv.org··Hacker News

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

⚡Flash Attention Code

github.com··Hacker News

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

🤖AI Code

github.com··Hacker News

LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization

💬LLMs Academic

arxiv.org·

SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines

⚡Hardware Acceleration Academic

arxiv.org·

xarray/osgverse: osgVerse, a complete 3d engine solution based on OpenSceneGraph. It supports OpenGL/OpenGLES/Vulkan/DirectX/Metal backends, and also works on modern browsers using WASM.

✨Computer Graphics Code

github.com·

Communication Strategy Selection for Multi-GPU 3D FDTD with Convolutional Perfectly Matched Boundary Layers

✨Computer Graphics Academic

arxiv.org·

Vulkan 1.4.353 Released With Three New Extensions

🎮Game Engines

phoronix.com·

Trystan-SA/rproc: A Linux resource & process monitor inspired by Windows 11's Task Manager. Written in Rust with Slint

⚡Hardware Acceleration Code

github.com··DEV

On GPU Implementation for Multi-Precision Integer Division

⚡Hardware Acceleration Academic

arxiv.org·

HigherOrderCO/Bend: A massively parallel, high-level programming language

✨Computer Graphics Code

github.com·

MusaCoder: Native GPU Kernel Generation with Full-Stack Training on Moore Threads GPU

🎮Reinforcement Learning Academic

arxiv.org·

zhongkaifu/TensorSharp: A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability

💬LLMs Code

github.com··Hacker News

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

🤖AI Academic

arxiv.org··Hacker News

Show HN: One-Shot Program Generation Through Direct Memory Diffusion

🤖AI Code

github.com··Hacker News

Vortex 3.0 Released As Full-Stack, Open-Source RISC-V GPU Now With 3D Pipeline

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization

SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines

xarray/osgverse: osgVerse, a complete 3d engine solution based on OpenSceneGraph. It supports OpenGL/OpenGLES/Vulkan/DirectX/Metal backends, and also works on modern browsers using WASM.

Communication Strategy Selection for Multi-GPU 3D FDTD with Convolutional Perfectly Matched Boundary Layers

Vulkan 1.4.353 Released With Three New Extensions

Trystan-SA/rproc: A Linux resource & process monitor inspired by Windows 11's Task Manager. Written in Rust with Slint

On GPU Implementation for Multi-Precision Integer Division

HigherOrderCO/Bend: A massively parallel, high-level programming language

MusaCoder: Native GPU Kernel Generation with Full-Stack Training on Moore Threads GPU

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

Show HN: One-Shot Program Generation Through Direct Memory Diffusion