🔢 FP8 Training - nayyara.airlangga · Scour

"North Mini Code"; open weights, 30B param, Canadian coding model

⏱️Prefill Decoding Blog

cohere.com··Hacker News

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

💾KV Cache Code

github.com··r/LocalLLaMA

P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8

💰Inference Cost Academic

Show HN: AutoGPU – AI designs a real 7nm GPU, from Verilog to GDSII

🧠HBM Bandwidth Code

github.com··Hacker News

DJI 20260603031949 0009 D CHAR withad ARTOP 500 Local LLM Motherboard with GRX50 and RTX 5090

🧠HBM Bandwidth

armdevices.net·

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

💾KV Cache Code

github.com··Hacker News

SpenseGPT: Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference

💰Inference Cost Academic

Intel's mysterious new datacenter GPU is what Nvidia's Rubin CPX nearly was

theregister.com·

How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies

🧠Inference Engineering Blog

blogs.nvidia.com·

Forlinx launches Rockchip RK3572 system-on-module (SoM) and development board with Linux 6.12 BSP - CNX Software

🪄Chiplet Design

cnx-software.com·

Ablation Study of Block Size, Weight Precision, and Scale Precision in NVFP4 Inference for Low-Power Edge-Efficient Neural Networks

🚀Speculative Decoding Academic

FSR 4.1 made older Radeon cards interesting again, but not for the reason AMD wants

🎮GPU Computing

xda-developers.com·

Build a local voice agent with Red Hat OpenShift AI

🎮GPU Computing

developers.redhat.com·

Does anyone know what PCIe mode was used for these benchmarks?

💾KV Cache Code

github.com··r/LocalLLaMA

[AINews] not much happened today

💰Inference Cost News

·

Log in to enable infinite scrolling