🏎️ TensorRT - miterion · Scour

Artain-AI/ignite-ms: Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control. ⚡ONNX Runtime

github.com·23h·Hacker News

Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct ⚡ONNX Runtime

huggingface.co·6d·r/LocalLLaMA

Architecture Dependent Temporal Observability Under Deployment Interference in Edge Inference Systems ⏱️CUDA Events

GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU 📈GPU Occupancy

theahmadosman.substack.com·19h·Substack, r/LocalLLaMA

12x faster Elasticsearch vector indexing: deploying NVIDIA cuVS with GPU and CPU tiers 📈GPU Occupancy

reComputer RK3576/RK3588 Edge AI computers are supported by reComputer AI Lab one-click deployment platform ⚡ONNX Runtime

cnx-software.com·3h

Inside the M4 Apple Neural Engine, Part 2: ANE Benchmarks 🎯Tensor Cores

maderix.substack.com·3d·Substack

Bolt Challenges Nvidia With a Focus on Cutting-Edge Graphics 🎯GPU Kernels

spectrum.ieee.org

·4h·Hacker News

PyTorch vs TensorFlow Syntax: 15 Operations Side-by-Side 📜TorchScript

tildalice.io·2d

KV Cache and Flash Attention with interactive diagrams 🔲Loop Tiling

kvcache.cobanov.dev·21h·Hacker News

Deep Moats and Platform Shifts in Computing 🌊CUDA Streams

semiconductor.substack.com·3d·Substack

Running PyTorch Models on Apple Silicon GPUs with the ExecuTorch MLX Delegate ⚡ONNX Runtime

pytorch.org·2d·Hacker News

Flipper One Tech Specs ⏱️Benchmarking

docs.flipper.net·22h

I tried 4 LLM speedup techniques on CPU. Three made it slower. 📊Profiling Tools

deemwar-products.github.io·22h·Hacker News

Initial Benchmarks Of The SpacemiT K3 RVA23 RISC-V CPU With The K3 Pico-ITX 📈Occupancy Optimization

phoronix.com·1d·Hacker News

AMD promises to bring improved, hardware-backed FSR 4 upscaling to older Radeon GPUs 🎯GPU Kernels

arstechnica.com·6d

Less-relevant results

Singtech's 200 TOPS AI PC Aims to Move Large AI Models Off the Cloud 🔗NCCL

briefglance.com·10h

China unveils a CPU-only supercomputer capable of 1.54 exaflops — LineShine LX2 packs a frankly ridiculous 2.4 million Armv9 cores from Huawei 🌊CUDA Streams

·1d

Forlinx rolls out FET3572-C SoM and OK3572-C board with Rockchip RK3572 🧠CPU Architecture

linuxgizmos.com·3d

LLM Inference 🎓Model Distillation

iop.systems·14h

Log in to enable infinite scrolling