FP8 Training

Feeds to Scour
SubscribedAll
Scoured 35 posts in 6.8 ms

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell

 💰Inference Cost  Content type: News  Content type: Blog
developer.nvidia.com·

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

 💰Inference Cost  Content type: Academic
arxiv.org·

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

 ⏱️Prefill Decoding  Content type: Code
github.com··Hacker News

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

 🧠Inference Engineering  Content type: Blog
dnhkng.github.io·
Less-relevant results

Youssof Altoukhi (@Youssofal_)

 🧠Inference Engineering
xcancel.com··r/LocalLLaMA

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

 💰Inference Cost  Content type: Blog

A system programmer’s guide to LLM inference

 💰Inference Cost  Content type: Blog

DeepSeek V4, LeCun's Bet Against LLMs, and Lovable's Self-Improving Agent - The Tokenizer Edition #30

 🧠Inference Engineering

libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.

 🧠Inference Engineering  Content type: Code
github.com·

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

 🧠Inference Engineering  Content type: News

Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design

 💻Systems Programming  Content type: Academic
arxiv.org·

Apple rebuilt its on-device AI stack at WWDC 2026

 🔢GEMM Optimization  Content type: Blog
ziraph.com··Hacker News

3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1

 💰Inference Cost  Content type: Blog
databricks.com·

The economics of speculative decoding

 🚀Speculative Decoding  Content type: Blog

{ "id": "247ea069-731d-4b79-9d64-8807463de95c", "revision": 0, "last_no

 📡OpenTelemetry

not much happened today | AINews

 🧠Inference Engineering
news.smol.ai·

An 84-Format Numeric Catalog with Bit-Exact Conformance Vectors: A Vendor-Neutral Reference for FP8, BF16, MXFP4, and Microscaling Formats

 🪄Chiplet Design  Content type: Academic
arxiv.org·

Speculators v0.5.0: DFlash support and online training

 🚀Speculative Decoding
developers.redhat.com·

"North Mini Code"; open weights, 30B param, Canadian coding model

 ⏱️Prefill Decoding  Content type: Blog
cohere.com··Hacker News

Gigabyte AI Top 500: Local 600B Parameter LLM Desktop Training Hardware

 🎮GPU Computing
armdevices.net·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help