🎮 GPU Architecture - emulbasaka · Scour

SK Hynix bets HBM, wins Nvidia jackpot

jonpeddie.com·

Unreleased RTX 3050 Ti engineering sample appears in photos and benchmarks — the RTX 3060 alternative that never happened

💡FlashAttention News

tomshardware.com

·

Big Blue’s Redbook on Storage Scale KV Cache management

💡FlashAttention News

blocksandfiles.com·

[News] HBF Spurs Equipment Race; Hanmi Semiconductor Eyes First TC Bonder Deliveries in 2H26

💡FlashAttention News

trendforce.com··r/hardware

Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP

💻OS Blog

huggingface.co··Hacker News

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

💻OS Academic

2026 budget phones to bring back the waterdrop notch, and that’s not the only downgrade

gizmochina.com·

Unreleased RTX 3050 Ti graphics card spotted in the wild, GA106 GPU with 6GB VRAM

💡FlashAttention News

tweaktown.com·

The Inference Alpha: Maximizing Frontier Models on AMD

💻OS Blog

digitalocean.com·

AMD Board Partners Allegedly Tip When RDNA 5 GPUs Might Arrive

🟩CUDA News

hothardware.com·

Vortex 3.0 Released As Full-Stack, Open-Source RISC-V GPU Now With 3D Pipeline

'The thing that gives me hope is there is an enormous amount of capacity being built' - AMD's head of Ryzen and Radeon is pinning hopes of an end to the memory ...

💡FlashAttention News

·

Nvidia selects top three vendors for critical AI memory

💡FlashAttention

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

🟩CUDA News

arstechnica.com·

Chip Industry Week In Review

semiengineering.com·

Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

💻OS Academic

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

💻OS Code

github.com··r/LocalLLaMA

Google's new open model DiffusionGemma generates text from noise instead of word by word

the-decoder.com

·

Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon

xda-developers.com·

Industry coalition urges Trump administration to take urgent action as AI data centers' extreme memory consumption threatens other industries — AI-driven memory chip shortage could raise prices in automotive, medical, telecommunications sectors

💡FlashAttention News

tomshardware.com

·

Log in to enable infinite scrolling