🔢 Quantization of LLMs - pleto · Scour

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

✨Model optimizations in LLMs

vettedconsumer.com··Hacker News

Qwen 3.6 27B AutoRound GGUF, need your feedback

✨Model optimizations in LLMs

huggingface.co··r/LocalLLaMA

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

🔧Systems-level optimizations for LLM serving Academic

lightmetal: GPU LLM Inference From a Single Java 25 JAR

🧠Large Language Models (LLMs) Blog

adambien.blog·

Orchestrate your LLM pipeline. Locally

🧠Large Language Models (LLMs)

llmforge.app··Hacker News

local llm on laptop 780M GPU using llama + gemma 4 qat

🧠Large Language Models (LLMs) Blog

alper.bearblog.dev·

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

🚀LLM serving frameworks

everylocalai.com··DEV

Ask HN: What's the best LLM model that on a 24 GB VRAM GPU?

🌐Distributed LLM Systems Discussion

news.ycombinator.com··Hacker News

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

✨Model optimizations in LLMs News Blog

kaitchup.substack.com··r/LocalLLaMA

Less-relevant results

Model2vec-zig: static text embeddings in pure Zig, in a single binary

✨Model optimizations in LLMs

Unsloth Gemma 4 QAT

✨Model optimizations in LLMs

DeskDash - a free Windows tool to easily manage your GGUF files

💬Prompt optimizations for LLM serving

gerry7.itch.io··r/LocalLLaMA

alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.

🤖Agents using LLMs Code

DiffusionGemma 26B A4B results on my 5090

🧠Large Language Models (LLMs)

huggingface.co··r/LocalLLaMA

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

🧠Large Language Models (LLMs)

deemwar-products.github.io··Hacker News

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

🧠Large Language Models (LLMs) Blog

adambien.blog·

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

🚀LLM serving frameworks

alternativeto.net·

TurboQuant in PostgreSQL

🔍Retrieval-augmented generation Blog

blog.mayflower.de·

A system programmer’s guide to LLM inference

🔧Systems-level optimizations for LLM serving Blog

blog.xiangpeng.systems··Hacker News

Quality Is Not a Safety Proxy Under Quantization

✨Model optimizations in LLMs Academic

Log in to enable infinite scrolling