🟢 NVIDIA - kudolink · Scour

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

🧠LLMs News

newsletter.semianalysis.com

··Hacker News

Release TorchCodec 0.14: HDR Video Decoding for CPU & CUDA, and Fast Wav Decoder · meta-pytorch/torchcodec

🎵Vibe Coding Code

github.com··Hacker News

Expanding Private Cloud Compute - Apple Security Research

☁️Cloud Computing Blog

security.apple.com··Lobsters, Hacker News, r/apple

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell

🏠Local LLMs News Blog

developer.nvidia.com·

NVIDIA, KRAFTON, NC and Reigning ‘League of Legends’ Champions T1 Celebrate RTX Spark at Korea’s PC Bangs

🧠Transformers Blog

blogs.nvidia.com·

Apple rebuilt its on-device AI stack at WWDC 2026

🛠️Developer Tools Blog

ziraph.com··Hacker News

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

💻Code Generation Academic

arxiv.org··Hacker News

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

🤗Open Source AI Code

github.com··Hacker News

Microsoft continues its big Linux push at Build 2026

☁️Cloud Computing

zdnet.com··Hacker News

Why Compiler Engineers Rarely Use Strassen's Algorithm for Fast Matrix Multiplications

🏗️Software Architecture News Blog

leetarxiv.substack.com··Substack, r/programming

On-device AI is a margin decision

🏠Local LLMs Blog

ziraph.com··Hacker News

Fine-tune FLUX.2 [Klein] with a LoRA under 60 minutes

🤗Open Source AI Blog

huggingface.co··Hacker News

Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training

🤗Open Source AI News

tomshardware.com

··Hacker News

NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure

🏗️Software Architecture Blog

blogs.nvidia.com··Hacker News

Ideogram-4-FP8 Brings High-Quality Text-to-Image Generation to More Hardware

✍️Prompt Engineering

hackernoon.com·

The Download: how the World Cup ball will fly and OpenAI’s “super app”

💻Tech Industry News

technologyreview.com··Hacker News

Apple Silicon's on-device AI bet hasn't moved – only the chip range that runs it

💻Tech Industry

tbreak.com··Hacker News, r/apple

Unpacking AI: The Hardware Behind AI

🕵️Agentic AI News

pathtostaff.com··Hacker News

Scarcity is driving AI innovation outside Silicon Valley

📈AI Industry

restofworld.org··Hacker News

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

🏠Local LLMs Code

github.com··r/LocalLLaMA

Log in to enable infinite scrolling