📊 CUDA Graphs - miterion · Scour

AI in Multiple GPUs: Understanding the Host and Device Paradigm

towardsdatascience.com·15h

⏱️CUDA Events

Beyond a Single Queue: Multi-Level-Multi-Queue as an Effective Design for SSSP problems on GPUs

arxiv.org·1d

🌊CUDA Streams

Memgraph 3.8 is Out: Atomic GraphRAG + Vector Single Store With Major Performance Upgrades

memgraph.com·10h·

Discuss: Hacker News

⚡CUDA Programming Patterns

AndPuQing/gflow: A lightweight, single-node GPU job scheduler implemented in Rust.

github.com·1d·

Discuss: Hacker News

⏱️CUDA Events

Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A

arxiv.org·23h

🌊CUDA Streams

Show HN: Solving Sudoku reasoning via Energy Geometric models

davisgeometric.com·19h·

Discuss: Hacker News

Building a Zero-Dependency secp256k1 CUDA Engine from Scratch (2.5B ops/SEC)

github.com·1d·

Discuss: Hacker News

Building Mission Control for My AI Workforce: Introducing OpenClaw Command Center

jontsai.com·5h·

Discuss: Hacker News

🤖AI Coding Tools

The Efficiency Wall: Why the Next 1,000x Leap Isn’t More GPUs

pub.towardsai.net

·1d

🌊CUDA Streams

Two Ways to Move Tensors Without Stopping: Inside vLLM's Async GPU Transfer Patterns

dev.to·1d·

Discuss: DEV

🌊CUDA Streams

A RISC-V vector extension primer

blog.adafruit.com·12h

Nvidia DGX Spark update cuts idle power by 32% or more — hot-plug detection on ConnectX NIC makes for a more efficient AI workstation

tomshardware.com

·8h

Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy

venturebeat.com·6h

Discussion - Investigation of Single Thread CPU "Thoughput/cycle"

forums.anandtech.com·1d

📊Profiling Tools

AI, GPU, And HPC Data Centers: The Infrastructure Behind Modern AI

semiengineering.com·20h

⏱️CUDA Events

From hand-tuned to generated: A reproducible Triton GPU kernel benchmark across different vendors

next.redhat.com·11h

⏱️CUDA Events

OLIX: Compute Manifesto

olix.com·1d·

Discuss: Hacker News

⚡CUDA Programming Patterns

Introducing Dedicated Container Inference: Delivering 2.6x faster inference for custom AI models

together.ai·1d

⚡ONNX Runtime

a Linux VM manager with easy GPU-passthrough and more

vm-curator.org·9h·

Discuss: Hacker News

Linux 7.0 Performance Events Prep For Intel Xeon Diamond Rapids

phoronix.com·15h

⏱️CUDA Events

Loading more...