Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
FP8 Training
🔢 FP8 Training
Specific
FP8, float8, mixed precision, H100 transformer engine
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
35
posts in
6.8
ms
Train
Models Faster with JAX and MaxText Using NVFP4 on
NVIDIA
Blackwell
💰
Inference Cost
Content type:
News
Content type:
Blog
developer.nvidia.com
·
2d
2 days ago
Actions for Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell
Alignment Collapse Under KV Cache
Quantization
: Diagnosis and Mitigation
💰
Inference Cost
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation
Nvidia
DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script
⏱️
Prefill Decoding
Content type:
Code
github.com
·
5d
5 days ago
·
Hacker News
Actions for Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script
2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP
🧠
Inference Engineering
Content type:
Blog
dnhkng.github.io
·
2d
2 days ago
Actions for 2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP
Less-relevant results
Youssof Altoukhi (@Youssofal_)
🧠
Inference Engineering
xcancel.com
·
3d
3 days ago
·
r/LocalLLaMA
Actions for Youssof Altoukhi (@Youssofal_)
MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS
💰
Inference Cost
Content type:
Blog
mimo.xiaomi.com
·
2d
2 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS
A system programmer’s guide to LLM inference
💰
Inference Cost
Content type:
Blog
blog.xiangpeng.systems
·
2d
2 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
DeepSeek V4, LeCun's Bet Against LLMs, and Lovable's Self-Improving Agent - The Tokenizer Edition #30
🧠
Inference Engineering
newsletter.artofsaience.com
·
6d
6 days ago
Actions for DeepSeek V4, LeCun's Bet Against LLMs, and Lovable's Self-Improving Agent - The Tokenizer Edition #30
libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.
🧠
Inference Engineering
Content type:
Code
github.com
·
21h
21 hours ago
Actions for libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.
DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
🧠
Inference Engineering
Content type:
News
newsletter.semianalysis.com
·
1d
1 day ago
·
Hacker News
Actions for DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design
💻
Systems Programming
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design
Apple rebuilt its on-device AI stack at WWDC 2026
🔢
GEMM Optimization
Content type:
Blog
ziraph.com
·
1d
1 day ago
·
Hacker News
Actions for Apple rebuilt its on-device AI stack at WWDC 2026
3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1
💰
Inference Cost
Content type:
Blog
databricks.com
·
6d
6 days ago
Actions for 3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1
The economics of speculative decoding
🚀
Speculative Decoding
Content type:
Blog
fergusfinn.com
·
2d
2 days ago
·
Hacker News
Actions for The economics of speculative decoding
{ "id": "247ea069-731d-4b79-9d64-8807463de95c", "revision": 0, "last_no
📡
OpenTelemetry
pastebin.com
·
11h
11 hours ago
·
r/StableDiffusion
Actions for { "id": "247ea069-731d-4b79-9d64-8807463de95c", "revision": 0, "last_no
not much happened today | AINews
🧠
Inference Engineering
news.smol.ai
·
5d
5 days ago
Actions for not much happened today | AINews
An 84-Format Numeric Catalog with Bit-Exact Conformance Vectors: A Vendor-Neutral Reference for
FP8
,
BF16
, MXFP4, and Microscaling Formats
🪄
Chiplet Design
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for An 84-Format Numeric Catalog with Bit-Exact Conformance Vectors: A Vendor-Neutral Reference for FP8, BF16, MXFP4, and Microscaling Formats
Speculators v0.5.0: DFlash support and online
training
🚀
Speculative Decoding
developers.redhat.com
·
6d
6 days ago
Actions for Speculators v0.5.0: DFlash support and online training
"North Mini Code"; open weights, 30B param, Canadian coding model
⏱️
Prefill Decoding
Content type:
Blog
cohere.com
·
1d
1 day ago
·
Hacker News
Actions for "North Mini Code"; open weights, 30B param, Canadian coding model
Gigabyte AI Top 500: Local 600B Parameter LLM Desktop
Training
Hardware
🎮
GPU Computing
armdevices.net
·
6d
6 days ago
Actions for Gigabyte AI Top 500: Local 600B Parameter LLM Desktop Training Hardware
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help