Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Quantization
⚡ Quantization
Model Compression, INT8, Weight Quantization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
99
posts in
6.2
ms
ScaleSweep: Accurate NVFP4
Post-Training
Quantization
of LLMs via Block Scale Initialization
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for ScaleSweep: Accurate NVFP4 Post-Training Quantization of LLMs via Block Scale Initialization
alexziskind1/model-shelf
:
Model
Shelf is a local-first
model
resolver that helps AI agents and scripts find
model
weights
on your own storage before downloading from Hugging Face. Point it at an
internal
SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.
🤖
AI
Content type:
Code
github.com
·
5d
5 days ago
Actions for alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.
Nvidia DGX Spark GB10 – AI
Models
and Guide with vLLM and Autonomous Script
💬
LLMs
Content type:
Code
github.com
·
5d
5 days ago
·
Hacker News
Actions for Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script
Sigma-Branch: Hierarchical Single-Path Network Reconstruction for Dynamic Inference with Reduced Active Parameters
🤖
AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Sigma-Branch: Hierarchical Single-Path Network Reconstruction for Dynamic Inference with Reduced Active Parameters
FADA: Accessible fetal ultrasound
interpretation
and annotation with a selectively distilled unified vision-language
model
🧠
Deep Learning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model
john-rocky/apple-silicon-llm-bench: Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX,
llama.cpp
, CoreML, Apple Foundation
Models
💬
LLMs
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for john-rocky/apple-silicon-llm-bench: Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models
Minimizing the Hidden Cost of Scales: Graph-Guided
Ultra-Low-Bit
Quantization
for Large Language
Models
💬
LLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models
harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.
🤖
AI
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.
FAIR-Calib:
Frontier-Aware
Instability-Reweighted Calibration for
Post-Training
Quantization of Diffusion Large Language Models
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models
Value-and-Structure Alignment for Routing-Consistent
Quantization
of Mixture-of-Experts
Models
📊
Vector Quantization
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Value-and-Structure Alignment for Routing-Consistent Quantization of Mixture-of-Experts Models
147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens
💬
LLMs
Content type:
Blog
adambien.blog
·
1d
1 day ago
Actions for 147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens
Qwen3.6 + MTP: Calculated context size is smaller when I use `--spec-draft-type-* q4_0`. is this normal? · ggml-org
llama.cpp
· Discussion #24102
🤖
AI
Content type:
Discussion
Content type:
Code
github.com
·
6d
6 days ago
·
r/LocalLLaMA
Actions for Qwen3.6 + MTP: Calculated context size is smaller when I use `--spec-draft-type-* q4_0`. is this normal? · ggml-org llama.cpp · Discussion #24102
LLMCodec: Adapting Video Codecs for Efficient
Weight
Compression
of Large Language
Models
💬
LLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models
Does anyone know what PCIe
mode
was used for these benchmarks?
💬
LLMs
Content type:
Code
github.com
·
4d
4 days ago
·
r/LocalLLaMA
Actions for Does anyone know what PCIe mode was used for these benchmarks?
Knowledge Distillation for Visual Autoregressive
Models
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Knowledge Distillation for Visual Autoregressive Models
model
: Granite4 Vision by gabe-l-hart · Pull Request #23545 ·
ggml-org/llama.cpp
🖥️
GPU Programming
Content type:
Code
github.com
·
5d
5 days ago
·
r/LocalLLaMA
Actions for model: Granite4 Vision by gabe-l-hart · Pull Request #23545 · ggml-org/llama.cpp
iChristGit/comfyui-llamacpp-ideogram
: ComfyUI Prompt enhancer for ideogram4 powered by llama
cpp
🤖
AI
Content type:
Code
github.com
·
3d
3 days ago
·
r/StableDiffusion
Actions for iChristGit/comfyui-llamacpp-ideogram: ComfyUI Prompt enhancer for ideogram4 powered by llama cpp
SEAM:
Shortcut-Aware
Real-Time Detection of Scripted vs. Spontaneous Speech for
Interview
Guardrails
📈
Optimization
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails
SecRL-Prune
: Structured Reinforcement Learning-Based Pruning of CodeLLMs for Preserving Adversarial Code Mutation
🎮
Reinforcement Learning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for SecRL-Prune: Structured Reinforcement Learning-Based Pruning of CodeLLMs for Preserving Adversarial Code Mutation
TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies
🎮
Reinforcement Learning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help