⚡ Cuda - miterion · Scour

eitamring/gocudrv: Pure-Go CUDA Driver API bindings for Go. No cgo; runtime-loaded NVIDIA driver. ⏱️CUDA Events

github.com·4d·DEV

Deleting the 8.4GB Python Sidecar: Pure Go + CUDA with `CGO_ENABLED=0` ⏱️CUDA Events

eitamring.github.io·1d·DEV

Ryzen AI Halo is AMD’s $3,999 answer to maxing out ChatGPT 🔍Nsight

pcworld.com·6h

What GPU kernels mean for your distributed inference 🎯GPU Kernels

developers.redhat.com·1d

Trying to compile to Nvidia PTX, but Debug adds fns that crash and Release strips everything 🔧PTX

CUDA Books 🎯GPU Kernels

news.ycombinator.com·3d·Hacker News

Ollama Doesn't Know Its GPU Is on Another Machine ⏱️CUDA Events

loopholelabs.io·1d·Hacker News

China Reportedly Blocks Entry of Nvidia GeForce RTX 5090D v2 Graphics Cards 🎮NVIDIA

tech4gamers.com·22h

Rare sale pushes this M4 iMac down to its lowest price of the year ⚡Flash Attention

macworld.com·1h

How Synthesia optimizes generative AI video inference on Amazon EC2 G7e instances 🌊CUDA Streams

aws.amazon.com·2d

AMD Launches the Ryzen AI Max 400 Series Processors: "Strix Halo" Gets a Memory Upgrade 🧠CPU Architecture

techpowerup.com·17h

Deep Moats and Platform Shifts in Computing 🌊CUDA Streams

semiconductor.substack.com·3d·Substack

Wes mckinney releases multiple bangers 🌐Distributed Computing

kenn.io·37m·Hacker News

Lightweight Gaussian Process Inference in C++ on Metal and CUDA 🏎️TensorRT

AMD’s Next Big Chip Hopes To Beat Nvidia’s CPUs While They’re in the Crib 📈GPU Occupancy

gizmodo.com·17h

Luce DFlash + PFlash on 7900XTX: Qwen3.6-27B at 2.24x decode and 3.05x prefill vs llama.cpp HIP ⏱️Benchmarking

lucebox.com·3d·r/LocalLLaMA

The brain still needs the hammer: Why compilers matter MORE in the agent era, not less 🤖AI Coding Tools

scale-lang.com·2d·Hacker News

Save over $1,120 on this 4K-ready RTX 5070 Ti gaming PC with 32GB DDR5 and a 2TB SSD — score a big discount on this ABS Kaze II rig from Newegg with a 24-core Intel CPU and powerful Nvidia GPU 📈GPU Occupancy

tomshardware.com

·5h

AMD just dropped a compact AI workstation that makes discrete GPUs look outdated for running LLMs 🔍Nsight

xda-developers.com·17h

michelangeloromerochisco/ternative: Inference engine for ternary-weight LLMs with runtime LoRA - the llama.cpp of BitNet models 🔄ONNX

github.com·1d·Hacker News

Log in to enable infinite scrolling