Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Quantization
⚡ LLM Quantization
Specific
quantization, GGUF, INT4, model compression, ternary inference
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
83
posts in
5.8
ms
Remove padding and multiple D2D copies for MTP by gaugarg-nv · Pull Request #24086 ·
ggml-org/llama.cpp
🧠
Local llm
Content type:
Code
github.com
·
9h
9 hours ago
·
r/LocalLLaMA
Actions for Remove padding and multiple D2D copies for MTP by gaugarg-nv · Pull Request #24086 · ggml-org/llama.cpp
146th airhacks tv: Rust, Java 25, AI Agents, BCE, Web Components, zunit, zb
🧠
Local llm
Content type:
Blog
adambien.blog
·
1d
1 day ago
Actions for 146th airhacks tv: Rust, Java 25, AI Agents, BCE, Web Components, zunit, zb
Apples to Apples: MLX vs.
Llama.cpp
for Gemma 4 12B on an M1 16GB
🧠
Local llm
Content type:
Blog
ziraph.com
·
5d
5 days ago
·
Hacker News
Actions for Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB
APEX4: Efficient Pure W4A4
LLM
Inference
via
Intra-SM
Compute Rebalancing
🧠
LLM Inference
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
🧠
LLM Inference
local-llm.utop.workers.dev
·
3d
3 days ago
·
Hacker News
Actions for Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
Apple WWDC On-Device AI Deep Dive - Google Docs
🧠
LLM Inference
gist.is
·
5h
5 hours ago
·
Hacker News
Actions for Apple WWDC On-Device AI Deep Dive - Google Docs
stable-diffusion.cpp/docs/quantization
_and_
gguf.md
at master ·
leejet/stable-diffusion.cpp
🧠
LLM Inference
Content type:
Code
github.com
·
3d
3 days ago
·
r/StableDiffusion
Actions for stable-diffusion.cpp/docs/quantization_and_gguf.md at master · leejet/stable-diffusion.cpp
Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent
🧠
LLM Inference
Content type:
Blog
dnhkng.github.io
·
2d
2 days ago
Actions for Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent
Ideogram4
GGUF
is out!
🧠
Local llm
huggingface.co
·
3d
3 days ago
·
r/StableDiffusion
Actions for Ideogram4 GGUF is out!
Gemma 4 12B: A unified, encoder-free multimodal
model
🧠
Local llm
Content type:
Discussion
news.ycombinator.com
·
3d
3 days ago
·
Hacker News
Actions for Gemma 4 12B: A unified, encoder-free multimodal model
alexziskind1/model-shelf
:
Model
Shelf is a local-first
model
resolver that helps AI agents and scripts find
model
weights on your own storage before downloading from Hugging Face. Point it at an
internal
SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for
GGUF
, MLX, safetensors, Ollama, vLLM, and other local AI workflows.
🧠
Local llm
Content type:
Code
github.com
·
5d
5 days ago
Actions for alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.
SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated
LLM
Serving
🧠
LLM Inference
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving
BeeLlama.cpp
DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster
🧠
Local llm
sleepingrobots.com
·
4d
4 days ago
Actions for BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster
A system programmer’s guide to
LLM
inference
🧠
LLM Inference
Content type:
Blog
blog.xiangpeng.systems
·
3d
3 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
mtmd : add video input support by ngxson · Pull Request #24269 ·
ggml-org/llama.cpp
🧠
Local llm
Content type:
Code
github.com
·
2d
2 days ago
·
r/LocalLLaMA
Actions for mtmd : add video input support by ngxson · Pull Request #24269 · ggml-org/llama.cpp
mtp: support for gemma-4 E2B and E4B assistants by max-krasnyansky · Pull Request #24282 ·
ggml-org/llama.cpp
🧠
Local llm
Content type:
Code
github.com
·
2d
2 days ago
·
r/LocalLLaMA
Actions for mtp: support for gemma-4 E2B and E4B assistants by max-krasnyansky · Pull Request #24282 · ggml-org/llama.cpp
OpenRTLSet: A Fully Open-Source Dataset for Large Language
Model-based
Verilog Module Design
🤖
Qwen
Content type:
Academic
arxiv.org
·
23h
23 hours ago
Actions for OpenRTLSet: A Fully Open-Source Dataset for Large Language Model-based Verilog Module Design
Dew Drop - June 8, 2026 (#4685)
🧠
Local llm
alvinashcraft.com
·
2d
2 days ago
Actions for Dew Drop - June 8, 2026 (#4685)
Launch HN: General Instinct (YC P26) – Frontier
models
on edge devices
🧠
Local llm
Content type:
Discussion
news.ycombinator.com
·
5d
5 days ago
·
Hacker News
Actions for Launch HN: General Instinct (YC P26) – Frontier models on edge devices
john-rocky/apple-silicon-llm-bench
: Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX,
llama.cpp
, CoreML, Apple Foundation Models
🤖
Qwen
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for john-rocky/apple-silicon-llm-bench: Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help