⚡ Quantization - jhcha.oyo · Scour

ScaleSweep: Accurate NVFP4 Post-Training Quantization of LLMs via Block Scale Initialization

💬LLMs Academic

alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.

🤖AI Code

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

💬LLMs Code

github.com··Hacker News

Sigma-Branch: Hierarchical Single-Path Network Reconstruction for Dynamic Inference with Reduced Active Parameters

🤖AI Academic

FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model

🧠Deep Learning Academic

john-rocky/apple-silicon-llm-bench: Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models

💬LLMs Code

github.com··Hacker News

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

💬LLMs Academic

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖AI Code

github.com··Hacker News

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

🎛️Fine-tuning Academic

Value-and-Structure Alignment for Routing-Consistent Quantization of Mixture-of-Experts Models

📊Vector Quantization Academic

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

💬LLMs Blog

adambien.blog·

Qwen3.6 + MTP: Calculated context size is smaller when I use `--spec-draft-type-* q4_0`. is this normal? · ggml-org llama.cpp · Discussion #24102

🤖AI Discussion Code

github.com··r/LocalLLaMA

LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models

💬LLMs Academic

Does anyone know what PCIe mode was used for these benchmarks?

💬LLMs Code

github.com··r/LocalLLaMA

Knowledge Distillation for Visual Autoregressive Models

👁️Computer Vision Academic

model: Granite4 Vision by gabe-l-hart · Pull Request #23545 · ggml-org/llama.cpp

🖥️GPU Programming Code

github.com··r/LocalLLaMA

iChristGit/comfyui-llamacpp-ideogram: ComfyUI Prompt enhancer for ideogram4 powered by llama cpp

🤖AI Code

github.com··r/StableDiffusion

SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails

📈Optimization Academic

SecRL-Prune: Structured Reinforcement Learning-Based Pruning of CodeLLMs for Preserving Adversarial Code Mutation

🎮Reinforcement Learning Academic

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

🎮Reinforcement Learning Academic

Log in to enable infinite scrolling