GPU Architecture

Feeds to Scour
SubscribedAll
Scoured 137 posts in 8.2 ms

SK Hynix bets HBM, wins Nvidia jackpot

 🟩CUDA
jonpeddie.com·

Unreleased RTX 3050 Ti engineering sample appears in photos and benchmarks — the RTX 3060 alternative that never happened

 💡FlashAttention  Content type: News
tomshardware.com
·

Big Blue’s Redbook on Storage Scale KV Cache management

 💡FlashAttention  Content type: News
blocksandfiles.com·

[News] HBF Spurs Equipment Race; Hanmi Semiconductor Eyes First TC Bonder Deliveries in 2H26

 💡FlashAttention  Content type: News
trendforce.com··r/hardware

Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP

 💻OS  Content type: Blog
huggingface.co··Hacker News

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

 💻OS  Content type: Academic
arxiv.org·

2026 budget phones to bring back the waterdrop notch, and that’s not the only downgrade

 🐧Kernel Dev
gizmochina.com·

Unreleased RTX 3050 Ti graphics card spotted in the wild, GA106 GPU with 6GB VRAM

 💡FlashAttention  Content type: News
tweaktown.com·

The Inference Alpha: Maximizing Frontier Models on AMD

 💻OS  Content type: Blog
digitalocean.com·

AMD Board Partners Allegedly Tip When RDNA 5 GPUs Might Arrive

 🟩CUDA  Content type: News
hothardware.com·

Vortex 3.0 Released As Full-Stack, Open-Source RISC-V GPU Now With 3D Pipeline

 🟩CUDA
phoronix.com·

'The thing that gives me hope is there is an enormous amount of capacity being built' - AMD's head of Ryzen and Radeon is pinning hopes of an end to the memory ...

 💡FlashAttention  Content type: News
pcgamer.com
·

Nvidia selects top three vendors for critical AI memory

 💡FlashAttention
techzine.eu·

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

 🟩CUDA  Content type: News
arstechnica.com·

Chip Industry Week In Review

 🐧Kernel Dev
semiengineering.com·

Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

 💻OS  Content type: Academic
arxiv.org·

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

 💻OS  Content type: Code
github.com··r/LocalLLaMA

Google's new open model DiffusionGemma generates text from noise instead of word by word

 MLSys
the-decoder.com
·

Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon

 💻OS
xda-developers.com·

Industry coalition urges Trump administration to take urgent action as AI data centers' extreme memory consumption threatens other industries — AI-driven memory chip shortage could raise prices in automotive, medical, telecommunications sectors

 💡FlashAttention  Content type: News
tomshardware.com
·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help