🔧 PTX - miterion · Scour

From Buffers to Registers: Unlocking Fine-Grained FlashAttention with Hybrid-Bonded 3D NPU Co-Design

arxiv.org·9h

⚡Flash Attention

htfab/microlane: Self-contained RTL to GDS flow for simple chip designs

github.com·15h·

Discuss: Hacker News

Zvec: SQLite-like simplicity in an embedded vector database (By Alibaba)

zvec.org·1h·

Discuss: Hacker News

AI Inference Needs A Mix-And-Match Memory Strategy

semiengineering.com·6h

🎯Tensor Cores

Fine Grained Everything, and what comes after React Server Components

blog.logrocket.com·1d

Intel "Nova Lake" Compute Tile Die-sizes Surface

techpowerup.com·1d

🔲Loop Tiling

MSI GeForce RTX 5090 Lightning Z review – Lightning-fast and thirsty unicorn in battle against NVIDIA’s clock speed barriers

igorslab.de·18m

📈GPU Occupancy

Ph42oN / dxvk-gplasync

gitlab.com·16h

⏱️CUDA Events

Hitting 1,000 tokens per second on a single RTX 5090

blog.alpindale.net·3d·

Discuss: Hacker News, Hacker News

🎛️CUDA Optimization

New AMD Adrenalin Driver

bluesnews.com·12h

One Platform to Run Apps, Data, and AI Anywhere

nutanix.com·18h

⚡ONNX Runtime

CodeSOD: Consistently Transactional

thedailywtf.com·7h

🌳Git Internals

How Andrej Karpathy Built a Working Transformer in 243 Lines of Code

analyticsvidhya.com·1h

📜TorchScript

Passing the Torch: Reflections on ARC’s Journey and the Future of Specialized Processing

eetimes.com·2d

⚡Flash Attention

CPU cloth simulation performance comparable to GPU SotA

sig25ddmpd.github.io·13h·

Discuss: Hacker News

Results from the Advent of FPGA Challenge

blog.janestreet.com·10h·

Discuss: Hacker News

🎯Tensor Cores

AndPuQing/gflow: A lightweight, single-node GPU job scheduler implemented in Rust.

github.com·1d·

Discuss: Hacker News

📊CUDA Graphs

Show HN: Solving Sudoku reasoning via Energy Geometric models

davisgeometric.com·4h·

Discuss: Hacker News

How Anam Achieved 250% Faster Inference Using Zymtrace Continuous GPU Profiling

zymtrace.com·3d

How to connect Convex to RunPod for serverless GPU workloads

stack.convex.dev·2d

Loading more...