⚡ Speculative Decoding - jhcha.oyo · Scour

AdaPLD: Adaptive Retrieval and Reuse for Efficient Model-Free Speculative Decoding

⚡Quantization Academic

Less-relevant results

The economics of speculative decoding

📈Algorithmic Trading Blog

fergusfinn.com··Hacker News

A system programmer’s guide to LLM inference

🔤Tokenization Blog

blog.xiangpeng.systems··Hacker News

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖AI Code

github.com··Hacker News

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

⚡Quantization Blog

mimo.xiaomi.com··Hacker News, r/LocalLLaMA

Speculators v0.5.0: DFlash support and online training

developers.redhat.com·

GoCritic! Review: Eeny, Meeny, Miny, Moe! - GoCritic! - Anifilm Liberec 2026

cineuropa.org·

China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)

💬LLMs News

BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster

🎮Game Engines

sleepingrobots.com·

Making LLMs faster and more efficient across multiple languages

techxplore.com·

Here's a llama.cpp CLI Command builder.

llamabuilding.com··r/LocalLLaMA

Nutrient control enables metabolic reconstruction of L. rhamnosus GG and analysis of secretions

📡Science Communication Academic

Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!

💬Natural Language Processing

ViaTunisia subsea segment reaches ready-for-service status

🎮Game Design News

computerweekly.com

·

K-Forcing: Joint Next-K-Token Decoding via Push-Forward Language Modeling

💬LLMs Academic

Photo Friday: Our valiant steeds

👁️Computer Vision

Qwen 3.6 27B AutoRound GGUF, need your feedback

⚡Quantization

huggingface.co··r/LocalLLaMA

a local Windows app for interview prep and mock practice

📈Optimization

ofarwise.com··Hacker News

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🤖AI Code

github.com··Hacker News

Jason McDonald

✍️Prompt Engineering

theamericanscholar.org·

Log in to enable infinite scrolling