🦙 llama.cpp - anarcher · Scour

RedToasty/llama.cpp_qts: Fixing --split-mode tensor, with different KV cache quantization types. 🤖LLM Inference

github.com·3d·r/LocalLLaMA

I tried 4 LLM speedup techniques on CPU. Three made it slower. 🤖LLM Inference

deemwar-products.github.io·10h·Hacker News

Luce DFlash + PFlash on 7900XTX: Qwen3.6-27B at 2.24x decode and 3.05x prefill vs llama.cpp HIP 🤖LLM Inference

lucebox.com·2d·r/LocalLLaMA

Benchmarking llama.cpp's brand-new MTP support on Strix Halo 🧠Memory Allocators

calebcoffie.com·2d·Hacker News

tvall43/Qwen3.5-14B-A3B-Claude-4.6-Opus-Reasoning-Distilled-reap-gguf at main 🤖LLM Inference

huggingface.co·18h·r/LocalLLaMA

Ollama vs vLLM vs llama.cpp: Which Wins for Your Use Case 🤖LLM Inference

tildalice.io·5d

Local LLMs are ready for real work 🤖LLM Inference

thelurkreport.beehiiv.com·2d·r/LocalLLaMA

GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU 🤖LLM Inference

theahmadosman.substack.com·8h·Substack, r/LocalLLaMA

Find bugs in YOUR code using OpenCode, Llama.cpp and Qwen3.6 ⚙️Zig

wtarreau.blogspot.com·3d·Lobsters, Hacker News, wtarreau.blogspot.com

HF downloader utility tampermonkey 🤖LLM Inference

greasyfork.org·2d·r/LocalLLaMA

LM Studio 🤖LLM Inference

flathub.org·6d

I replaced GitHub Copilot with a self-hosted AI and I won’t go back ⚙️Zig

xda-developers.com·10h

What's in a GGUF, besides the weights - and what's still missing? 🤖LLM Inference

nobodywho.ooo·6d·Hacker News, r/LocalLLaMA

Building a Controllable Inference Platform on Kubernetes with AI Runway 🤖LLM Inference

techcommunity.microsoft.com·2d

Best Local LLMs for Mac in 2026 — M1, M2, M3, M4 Tested 🧠Memory Allocators

insiderllm.com·4d

Ollama Cheat Sheet: Local LLMs, Models, API & Integration (2026) 🤖LLM Inference

meshworld.in·2d·DEV

Tokenizer Tampering 🤖LLM Inference

hiddenlayer.com·2d

nohurry/gemma-4-26B-A4B-it-heretic-GUFF 🤖LLM Inference

huggingface.co·14h

BrunoArsioli/llama-optimus: Lightweight Python tool using Optuna for tuning llama.cpp flags: towards optimal tok/s for your machine 🧠Memory Allocators

github.com·11h·r/LocalLLaMA

Tagging my blog posts with BERTopic and LLMs 🤖LLM Inference

vickiboykis.com·3d·Hacker News

Log in to enable infinite scrolling