Quantization

Feeds to Scour
SubscribedAll
Scoured 99 posts in 6.2 ms

ScaleSweep: Accurate NVFP4 Post-Training Quantization of LLMs via Block Scale Initialization

 💬LLMs  Content type: Academic
arxiv.org·

alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.

 🤖AI  Content type: Code
github.com·

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

 💬LLMs  Content type: Code
github.com··Hacker News

Sigma-Branch: Hierarchical Single-Path Network Reconstruction for Dynamic Inference with Reduced Active Parameters

 🤖AI  Content type: Academic
arxiv.org·

FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model

 🧠Deep Learning  Content type: Academic
arxiv.org·

john-rocky/apple-silicon-llm-bench: Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models

 💬LLMs  Content type: Code
github.com··Hacker News

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

 💬LLMs  Content type: Academic
arxiv.org·

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

 🤖AI  Content type: Code
github.com··Hacker News

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

Value-and-Structure Alignment for Routing-Consistent Quantization of Mixture-of-Experts Models

 📊Vector Quantization  Content type: Academic
arxiv.org·

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

 💬LLMs  Content type: Blog
adambien.blog·

Qwen3.6 + MTP: Calculated context size is smaller when I use `--spec-draft-type-* q4_0`. is this normal? · ggml-org llama.cpp · Discussion #24102

 🤖AI  Content type: Discussion  Content type: Code
github.com··r/LocalLLaMA

LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models

 💬LLMs  Content type: Academic
arxiv.org·

Does anyone know what PCIe mode was used for these benchmarks?

 💬LLMs  Content type: Code
github.com··r/LocalLLaMA

Knowledge Distillation for Visual Autoregressive Models

 👁️Computer Vision  Content type: Academic
arxiv.org·

model: Granite4 Vision by gabe-l-hart · Pull Request #23545 · ggml-org/llama.cpp

 🖥️GPU Programming  Content type: Code
github.com··r/LocalLLaMA

iChristGit/comfyui-llamacpp-ideogram: ComfyUI Prompt enhancer for ideogram4 powered by llama cpp

 🤖AI  Content type: Code

SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails

 📈Optimization  Content type: Academic
arxiv.org·

SecRL-Prune: Structured Reinforcement Learning-Based Pruning of CodeLLMs for Preserving Adversarial Code Mutation

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help