Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Quantization of LLMs
🔢 Quantization of LLMs
Specific
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
75
posts in
6.5
ms
GGUF
vs GPTQ vs
AWQ
: The Plain-English Guide to
LLM
Quantization (and Which One to Pick)
✨
Model optimizations in LLMs
vettedconsumer.com
·
5d
5 days ago
·
Hacker News
Actions for GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)
Qwen 3.6 27B AutoRound
GGUF
, need your feedback
✨
Model optimizations in LLMs
huggingface.co
·
2d
2 days ago
·
r/LocalLLaMA
Actions for Qwen 3.6 27B AutoRound GGUF, need your feedback
TileFuse: A Fused Mixed-Precision Kernel Library for Efficient
Quantized
LLM
Inference on AMD NPUs
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs
Orchestrate your
LLM
pipeline. Locally
🧠
Large Language Models (LLMs)
llmforge.app
·
5h
5 hours ago
·
Hacker News
Actions for Orchestrate your LLM pipeline. Locally
lightmetal: GPU
LLM
Inference From a Single Java 25 JAR
🧠
Large Language Models (LLMs)
Content type:
Blog
adambien.blog
·
2d
2 days ago
Actions for lightmetal: GPU LLM Inference From a Single Java 25 JAR
Improved performance and
model
support with
GGUF
🚀
LLM serving frameworks
Content type:
Blog
ollama.com
·
6d
6 days ago
Actions for Improved performance and model support with GGUF
Ask HN: What's the best
LLM
model
that on a 24 GB VRAM GPU?
🌐
Distributed LLM Systems
Content type:
Discussion
news.ycombinator.com
·
3h
3 hours ago
·
Hacker News
Actions for Ask HN: What's the best LLM model that on a 24 GB VRAM GPU?
Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA
🚀
LLM serving frameworks
everylocalai.com
·
1d
1 day ago
·
DEV
Actions for Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA
local
llm
on laptop 780M GPU using
llama
+ gemma 4 qat
🧠
Large Language Models (LLMs)
Content type:
Blog
alper.bearblog.dev
·
5d
5 days ago
Actions for local llm on laptop 780M GPU using llama + gemma 4 qat
Less-relevant results
Model2vec-zig
: static text embeddings in pure Zig, in a single binary
✨
Model optimizations in LLMs
ziggit.dev
·
2h
2 hours ago
Actions for Model2vec-zig: static text embeddings in pure Zig, in a single binary
MoQ
GGUFs
and GSQ:
Low-Bit
GGUFs
Are About to Get Much Better
✨
Model optimizations in LLMs
Content type:
News
Content type:
Blog
kaitchup.substack.com
·
6d
6 days ago
·
r/LocalLLaMA
Actions for MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent
🧠
Large Language Models (LLMs)
Content type:
Blog
bric.pe.kr
·
2d
2 days ago
·
DEV
Actions for MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent
Unsloth Gemma 4 QAT
✨
Model optimizations in LLMs
unsloth.ai
·
6d
6 days ago
Actions for Unsloth Gemma 4 QAT
DeskDash - a free Windows tool to easily manage your
GGUF
files
💬
Prompt optimizations for LLM serving
gerry7.itch.io
·
4d
4 days ago
·
r/LocalLLaMA
Actions for DeskDash - a free Windows tool to easily manage your GGUF files
alexziskind1/model-shelf
:
Model
Shelf is a local-first
model
resolver that helps AI agents and scripts find
model
weights
on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for
GGUF
, MLX, safetensors, Ollama, vLLM, and other local AI workflows.
🤖
Agents using LLMs
Content type:
Code
github.com
·
6d
6 days ago
Actions for alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.
DiffusionGemma 26B A4B results on my 5090
🧠
Large Language Models (LLMs)
huggingface.co
·
1d
1 day ago
·
r/LocalLLaMA
Actions for DiffusionGemma 26B A4B results on my 5090
Show HN: Run
Llama.cpp
In-Process from Java with Project Panama FFM
🧠
Large Language Models (LLMs)
deemwar-products.github.io
·
6d
6 days ago
·
Hacker News
Actions for Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM
147th airhacks tv: Local
LLMs
, LightMetal, ZSmith Agents, AI Rails, Saving Tokens
🧠
Large Language Models (LLMs)
Content type:
Blog
adambien.blog
·
1d
1 day ago
Actions for 147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens
Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support
🚀
LLM serving frameworks
alternativeto.net
·
3d
3 days ago
Actions for Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support
TurboQuant in
PostgreSQL
🔍
Retrieval-augmented generation
Content type:
Blog
blog.mayflower.de
·
13h
13 hours ago
Actions for TurboQuant in PostgreSQL
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help