LLM Inference

Feeds to Scour
SubscribedAll
Scoured 283 posts in 7.5 ms

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

 🤖LLMs  Content type: Blog
bric.pe.kr··DEV
Less-relevant results

Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks

 📈Performance Engineering

Quantization Was Never About the Bits

 🤖LLMs  Content type: Blog
medium.com
·

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

 🤖LLMs  Content type: News  Content type: Blog

DiffusionGemma: Discrete diffusion in a large language model

 ✍️Prompt Engineering

Intelligent inference scheduling with llm-d on Red Hat AI

 ✍️Prompt Engineering

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 🤖LLMs

How To Start Building Edge-Native AI

 📈Performance Engineering
semiengineering.com·

AI Serving Platform That Adapts to Your Model

 📈Performance Engineering  Content type: Blog
databricks.com·

Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit

 🤖LLMs

HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs

 🧮Vector Databases  Content type: Blog
elastic.co·

Optimal Post-Training Quantization Scales and Where to Find Them

 🤖LLMs  Content type: Academic
arxiv.org·

Model2vec-zig: static text embeddings in pure Zig, in a single binary

 🤖LLMs
ziggit.dev·

The economics of speculative decoding

 📈Performance Engineering  Content type: Blog

vLLM Transformers Backend: Bridging Hugging Face Compatibility and High-Performance Inference

 🤖LLMs  Content type: Blog
odsc.medium.com·

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

 🤖LLMs  Content type: Blog
dnhkng.github.io·

DiffusionGemma: The Developer Guide

 🤖LLMs  Content type: Blog

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

 📈Performance Engineering  Content type: Blog

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

 🤖LLMs

Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss

 🤖LLMs  Content type: News
digg.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help