⚡ GPU - plooh · Scour

Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]

openjdk.org··Lobsters, r/java

AmrDeveloper/Turtle: A Heterogeneous Pythonic 🐍 language to practice targeting CPU & GPU in the same program on Mobile Devices Influenced by Python, Mojo and CUDA

🐍Python Code

github.com··Hacker News

Training Cycle Halved: LoongForge End-to-End Optimization for GR00T N1.6 Delivers 2.3× Throughput

baidu-baige.github.io··Hacker News

Polars GPU engine — cudf 26.06.01 documentation

🔱Triton Reference

docs.rapids.ai··Hacker News

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

🚀CUDA Kernels Academic

arxiv.org··Hacker News, Hacker News

RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8

🤖llm Blog

imil.net··Hacker News, r/LocalLLaMA·Cited by 2 articles

Making FlashAttention-4 faster for inference

🚀CUDA Kernels Blog

modal.com··Hacker News, Hacker News

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

smolhub.com··r/LocalLLaMA

Orchestrate your LLM pipeline. Locally

llmforge.app··Hacker News

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

uccl-project.github.io··Hacker News

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

🤖llm News

newsletter.semianalysis.com

··Hacker News·Cited by 1 article

4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave

🤖llm Blog

sabareesh.com··Hacker News, r/LocalLLaMA

Personal AI for Research, Voice, and Everyday Tasks

whissle.ai··Hacker News

Less-relevant results

Local models in mid-2026: the engineering that closed the gap

coles.codes··Hacker News, r/LocalLLaMA

Agentic Memory Management for GPU Code Generation

🤖llm Blog

ucbskyadrs.github.io··Hacker News

Anyone been using CUDA 13.3 for the past week or 2?

🤖llm Code

github.com··r/LocalLLaMA

Mojo Nightly

🦀Rust Blog

mojolang.org··Hacker News

Why Compiler Engineers Rarely Use Strassen's Algorithm for Fast Matrix Multiplications

🦀Rust News Blog

leetarxiv.substack.com··Substack, r/programming

Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design

📶Beamforming Blog

tilert.ai··Hacker News·Cited by 2 articles

Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP

🐍Python Blog

huggingface.co··Hacker News·Cited by 1 article

Log in to enable infinite scrolling