Tensor Cores

Feeds to Scour
SubscribedAll
Scoured 128 posts in 6.8 ms

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

馃敳AI,GPU IC, SOC ICContent type: Academic
arxiv.org

Exploiting GPU Tensor Cores from Java using Babylon [Juan Fumero]

馃敳AI,GPU IC, SOC IC
openjdk.orgLobsters, r/java

Making FlashAttention-4 faster for inference

馃敳AI,GPU IC, SOC ICContent type: Blog
modal.comHacker News

The Inference Alpha: Maximizing Frontier Models on AMD

馃NPUContent type: Blog
digitalocean.com

NVIDIA RTX Pro 6000 Blackwell: 96GB GDDR7 and the End of VRAM Anxiety

馃敳AI,GPU IC, SOC ICContent type: Blog
fitservers.com
Less-relevant results

Release v8.4.66 - Add `nvidia-ml-py` to pyproject.toml (#23922) 路 ultralytics/ultralytics

馃摫Edge AIContent type: Code
github.com

Exploiting GPU Tensor Cores from Java using Babylon

馃敳AI,GPU IC, SOC IC
inside.java

Intel's Open Image Denoise 2.5 Delivers Solid Performance Improvements For GPUs

馃敁RISC-V
phoronix.com

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell

馃NPUContent type: NewsContent type: Blog
developer.nvidia.com

Discrete Diffusion Modelling by Estimating the Ratios of the Data Distribution

馃NPUContent type: NewsContent type: Blog

Framework Desktop AMD 395+ (rdna 3.5) cannot run confyui err Fix 2026

馃敳AI,GPU IC, SOC ICContent type: Blog
runaihome.comDEV

NVIDIA chip powers local AI workloads

馃敳AI,GPU IC, SOC IC
edn.com

Rebellions Bets on Memory-Centric Architecture as it Weighs IPO Options

馃敳AI,GPU IC, SOC ICContent type: News
eetimes.com

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

馃敳AI,GPU IC, SOC ICContent type: Academic
arxiv.org

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

馃敳AI,GPU IC, SOC ICContent type: Blog
dnhkng.github.io

NVIDIA at Computex 2026: RTX Spark Gaming Hands-On, DLSS 4.5, and More

馃敳AI,GPU IC, SOC IC
techpowerup.com

Anatomy of a high-performance EP kernel

馃敳AI,GPU IC, SOC ICContent type: Blog
fergusfinn.comHacker News

Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design

馃Agentic EngineeringContent type: Blog
tilert.aiHacker News

Nvidia GeForce RTX 50 Super GPUs may launch in early 2027 with 50% more VRAM

馃敳AI,GPU IC, SOC IC
club386.com

NVIDIA's RTX 5060 May Finally Get The VRAM Upgrade Gamers Wanted

馃敳AI,GPU IC, SOC ICContent type: News
hothardware.com

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help