Tensor Cores

Feeds to Scour
SubscribedAll
Scoured 128 posts in 6.0 ms

NVIDIA A100 vs RTX 4090 for AI Workloads: The Cost Per FLOP Reality

馃敳AI,GPU IC, SOC ICContent type: Blog
fitservers.com

A Fast Locality Simulator for GEMM Design-Space Exploration on Multi-Chiplet GPUs

馃敳AI,GPU IC, SOC ICContent type: Academic
arxiv.org
Less-relevant results

Apple WWDC On-Device AI Deep Dive - Google Docs

馃摫Edge AI
gist.isHacker News

Unsloth Gemma 4 QAT

馃NPU
unsloth.ai

Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP

馃敳AI,GPU IC, SOC ICContent type: Blog
huggingface.coHacker News

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

馃敳AI,GPU IC, SOC ICContent type: Code
github.comHacker News

OpenCV 5 Debuts with Improved ONNX Support and Native AI Upgrades

馃柤鍥惧儚澶勭悊Content type: News
hackster.io

DiffusionGemma: The Developer Guide- Google Developers Blog

馃NPUContent type: Blog

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

馃摫Edge AI

Making Locality-aware GEMM Compatible with Page-Granularity Placement on Chiplet GPUs

馃敳AI,GPU IC, SOC ICContent type: Academic
arxiv.org

Benchmarking dots.tts on Strix Halo

馃敳AI,GPU IC, SOC IC
sleepingrobots.com

NVIDIA Accelerates Google DeepMind鈥檚 DiffusionGemma for Local AI

馃NPUContent type: Blog
blogs.nvidia.com

The economics of speculative decoding

馃NPUContent type: Blog
fergusfinn.comHacker News

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

馃敳AI,GPU IC, SOC IC

Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon

馃NPUContent type: Blog
tridao.meHacker News

Qwen 3.6 27B AutoRound GGUF, need your feedback

馃敳AI,GPU IC, SOC IC
huggingface.cor/LocalLLaMA

Vortex 3.0 Released As Full-Stack, Open-Source RISC-V GPU Now With 3D Pipeline

馃敁RISC-V
phoronix.com

CoreML vs TFLite: iPhone 15 Pro GPU 2.3x Faster

馃摫Edge AIContent type: BlogContent type: Discussion
tildalice.io

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

馃NPUContent type: Academic
arxiv.org

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

馃敳AI,GPU IC, SOC ICContent type: Blog
jimmysong.io

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help