⚡ Quantization - ibrahimsharaf · Scour

RedToasty/llama.cpp_qts: Fixing --split-mode tensor, with different KV cache quantization types. 🔓Open Source AI

github.com·3d·r/LocalLLaMA

GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU 🚀LLM Deployment

theahmadosman.substack.com·8h·Substack, r/LocalLLaMA

DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers ⚙️Transformers

Why Shrinking an AI Model Often Makes It More Useful 🏢LLM Adoption

siliconopera.com·20h

Luce DFlash + PFlash on 7900XTX: Qwen3.6-27B at 2.24x decode and 3.05x prefill vs llama.cpp HIP 🎯LLM Finetuning

lucebox.com·2d·r/LocalLLaMA

DreamFast/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark 🎯LLM Finetuning

huggingface.co·3d

Benchmarking llama.cpp's brand-new MTP support on Strix Halo 🎯LLM Finetuning

calebcoffie.com·2d·Hacker News

Ollama vs vLLM vs llama.cpp: Which Wins for Your Use Case 🚀LLM Deployment

tildalice.io·5d

Science and Technology News and Commentary: Aardvark Daily 💻Local AI

aardvark.co.nz·15h

HF downloader utility tampermonkey 🔓Open Source AI

greasyfork.org·2d·r/LocalLLaMA

Find bugs in YOUR code using OpenCode, Llama.cpp and Qwen3.6 💻Local AI

wtarreau.blogspot.com·3d·Lobsters, Hacker News, wtarreau.blogspot.com

Command A+: Making sovereign agentic capabilities available to all 🤖AI Agents

cohere.com·12h·Hacker News

Unleashing Blackwell's 4-bit: a surgical look at MXFP4 and NVFP4 🎯LLM Finetuning

emre570.bearblog.dev·1d

Can You Run LLMs Locally Without a GPU? I Tested 8 Models on Linux 🎯LLM Finetuning

itsfoss.com·5d·Hacker News

Building a Controllable Inference Platform on Kubernetes with AI Runway 🚀LLM Deployment

techcommunity.microsoft.com·2d

qskousen/ggufy: CLI/GUI tool for efficient and easy safetensors and gguf model conversion 🎯LLM Finetuning

github.com·3h·r/StableDiffusion

Tokenizer Tampering 🧪Synthetic Data

hiddenlayer.com·2d

What's in a GGUF, besides the weights - and what's still missing? 🧠LLMs

nobodywho.ooo·6d·Hacker News, r/LocalLLaMA

tvall43/Qwen3.5-14B-A3B-Claude-4.6-Opus-Reasoning-Distilled-reap-gguf at main 💻Local AI

huggingface.co·18h·r/LocalLLaMA

Qwen 3.7 Preview 🚀LLM Deployment

news.ycombinator.com·2d·Hacker News

Log in to enable infinite scrolling