Quantization of LLMs

Feeds to Scour
SubscribedAll
Scoured 75 posts in 6.5 ms

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 Model optimizations in LLMs

Qwen 3.6 27B AutoRound GGUF, need your feedback

 Model optimizations in LLMs

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

 🔧Systems-level optimizations for LLM serving  Content type: Academic
arxiv.org·

Orchestrate your LLM pipeline. Locally

 🧠Large Language Models (LLMs)
llmforge.app··Hacker News

lightmetal: GPU LLM Inference From a Single Java 25 JAR

 🧠Large Language Models (LLMs)  Content type: Blog
adambien.blog·

Improved performance and model support with GGUF

 🚀LLM serving frameworks  Content type: Blog
ollama.com·

Ask HN: What's the best LLM model that on a 24 GB VRAM GPU?

 🌐Distributed LLM Systems  Content type: Discussion

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

 🚀LLM serving frameworks
everylocalai.com··DEV

local llm on laptop 780M GPU using llama + gemma 4 qat

 🧠Large Language Models (LLMs)  Content type: Blog
alper.bearblog.dev·
Less-relevant results

Model2vec-zig: static text embeddings in pure Zig, in a single binary

 Model optimizations in LLMs
ziggit.dev·

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

 Model optimizations in LLMs  Content type: News  Content type: Blog

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

 🧠Large Language Models (LLMs)  Content type: Blog
bric.pe.kr··DEV

Unsloth Gemma 4 QAT

 Model optimizations in LLMs
unsloth.ai·

DeskDash - a free Windows tool to easily manage your GGUF files

 💬Prompt optimizations for LLM serving

alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.

 🤖Agents using LLMs  Content type: Code
github.com·

DiffusionGemma 26B A4B results on my 5090

 🧠Large Language Models (LLMs)

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

 🧠Large Language Models (LLMs)

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

 🧠Large Language Models (LLMs)  Content type: Blog
adambien.blog·

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

 🚀LLM serving frameworks
alternativeto.net·

TurboQuant in PostgreSQL

 🔍Retrieval-augmented generation  Content type: Blog
blog.mayflower.de·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help