🎯 GPU Kernels - miterion · Scour

Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A

arxiv.org·13h

🌊CUDA Streams

The Linux driver implementer’s API guide

kernel.org·9h

📊Profiling Tools

AI Inference Needs A Mix-And-Match Memory Strategy

semiengineering.com·9h

🎯Tensor Cores

borodark/exmc: Probabilistic programming in BEAM

github.com·21h

⚡ONNX Runtime

DeepComputing Unveils RVA23-Compliant Mainboard III for Linux on Framework 13

lxer.com·1d

🎯Tensor Cores

Timing and Memory Telemetry on GPUs for AI Governance

arxiv.org·1d

⏱️CUDA Events

New AMD Adrenalin Driver

bluesnews.com·16h

AI agent sandboxing in 2026: how to choose between primitives, runtimes, and platforms

manveerc.substack.com·23h·

Discuss: Substack

How a ‘zombie’ chipmaker became Nvidia’s vital AI ally

ft.com

·1d

⚡CUDA Programming Patterns

Hitting 1,000 tokens per second on a single RTX 5090

blog.alpindale.net·3d·

Discuss: Hacker News, Hacker News

🎛️CUDA Optimization

Ph42oN / dxvk-gplasync

gitlab.com·20h

⏱️CUDA Events

Inference Providers Leverage NVIDIA Blackwell to Drive 10x Reduction in Token Costs

storagereview.com·1h

🏎️TensorRT

Intel Releases New Compute Runtime, Upstreams More SYCL Code To LLVM

phoronix.com·1d

building cuda-gdb from sources

redplait.blogspot.com·4d·

Discuss: redplait.blogspot.com

⚡CUDA Programming Patterns

Running Mistral-7B on Intel NPU — 12.6 tokens/s, zero CPU/GPU usage

github.com·12h·

Discuss: r/LocalLLaMA

📊Profiling Tools

How to connect Convex to RunPod for serverless GPU workloads

stack.convex.dev·2d

What Nvidia, Google and Meta Are Building Beyond Chips and Compute

pymnts.com·1d

AMD's 3D V-Cache is still the best gaming upgrade money can buy

xda-developers.com·19h

AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation

venturebeat.com·2h

🏎️TensorRT

Building DamN64: LLM-Assisted N64 Development

vieux.fr·22h·

Discuss: Hacker News

Loading more...