Quantization

Feeds to Scour
SubscribedAll
Scoured 60 posts in 12.4 ms

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 🔓Open-source Models

Quality Is Not a Safety Proxy Under Quantization

 🔓Open-source Models  Content type: Academic
arxiv.org·

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

 🔓Open-source Models
everylocalai.com··DEV

lightmetal: GPU LLM Inference From a Single Java 25 JAR

 🧠LLMs  Content type: Blog
adambien.blog·

alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.

 🔓Open-source Models  Content type: Code
github.com·

DiffusionGemma 26B A4B results on my 5090

 🔓Open-source Models

Improved performance and model support with GGUF

 🧠LLMs  Content type: Blog
ollama.com·

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

 🔓Open-source Models  Content type: News  Content type: Blog
Less-relevant results

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

 🧠LLMs  Content type: Blog
bric.pe.kr··DEV

Orchestrate your LLM pipeline. Locally

 🔓Open-source Models
llmforge.app··Hacker News

DeskDash - a free Windows tool to easily manage your GGUF files

 🧠LLMs

local llm on laptop 780M GPU using llama + gemma 4 qat

 🧠LLMs  Content type: Blog
alper.bearblog.dev·

Unsloth Gemma 4 QAT

 🔓Open-source Models
unsloth.ai·

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

 🖥️Inference Compute  Content type: Academic
arxiv.org·

146th airhacks tv: Rust, Java 25, AI Agents, BCE, Web Components, zunit, zb

 🕵️AI Agents  Content type: Blog
adambien.blog·

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

 🧠LLMs

Qwen 3.6 27B AutoRound GGUF, need your feedback

 🔓Open-source Models

A system programmer’s guide to LLM inference

 🔓Open-source Models  Content type: Blog

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

 🧠LLMs  Content type: Blog
dnhkng.github.io·

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

 🧠LLMs
alternativeto.net·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help