⚡ Speculative Decoding - jhcha.oyo · Scour

AdaPLD: Adaptive Retrieval and Reuse for Efficient Model-Free Speculative Decoding

⚡Quantization Academic

Less-relevant results

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

⚡Quantization Blog

mimo.xiaomi.com··Hacker News, r/LocalLLaMA

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🤖AI Code

github.com··Hacker News

Making LLMs faster and more efficient across multiple languages

techxplore.com·

Here's a llama.cpp CLI Command builder.

llamabuilding.com··r/LocalLLaMA

Qwen 3.6 27B AutoRound GGUF, need your feedback

⚡Quantization

huggingface.co··r/LocalLLaMA

a local Windows app for interview prep and mock practice

📈Optimization

ofarwise.com··Hacker News

Google Gemma 4 12B: Architecture, Benchmarks, Access, and Hands-on Guide for Developers

💬LLMs Blog

analyticsvidhya.com·

Measuring Embedding Drift: Why Hybrid Search Saves Stale Models.

🎯Fine-Tuning

pub.towardsai.net

·

K-Forcing: Joint Next-K-Token Decoding via Push-Forward Language Modeling

💬LLMs Academic

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖AI Code

github.com··Hacker News

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

⚡Quantization News Blog

kaitchup.substack.com··r/LocalLLaMA

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

💬LLMs Code

github.com··r/LocalLLaMA

[AINews] not much happened today

📉Technical Analysis News

·

Imbuing Large Language Models with Bidirectional Logic for Robust Chain Repair

🤖AI Academic

[PoC] server: support requantizing kv cache by wadealexc · Pull Request #24134 · ggml-org/llama.cpp

💬LLMs Code

github.com··r/LocalLLaMA

not much happened today | AINews

No more posts from jhcha.oyo's subscribed feeds.

Scour all 25255 feeds Learn more about Feeds

Log in to enable infinite scrolling