Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🦙 llama.cpp
Specific
llama.cpp, local LLM, GGUF, CPU inference
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
163
posts in
10.3
ms
RedToasty/llama.cpp
_qts: Fixing --split-mode tensor, with different KV cache
quantization
types.
🤖
LLM Inference
github.com
·
3d
·
r/LocalLLaMA
I tried 4
LLM
speedup techniques on
CPU
. Three made it slower.
🤖
LLM Inference
deemwar-products.github.io
·
10h
·
Hacker News
Luce DFlash + PFlash on 7900XTX: Qwen3.6-27B at 2.24x decode and 3.05x prefill vs
llama.cpp
HIP
🤖
LLM Inference
lucebox.com
·
2d
·
r/LocalLLaMA
Benchmarking
llama.cpp
's brand-new MTP support on Strix Halo
🧠
Memory Allocators
calebcoffie.com
·
2d
·
Hacker News
tvall43/Qwen3.5-14B-A3B-Claude-4.6-Opus-Reasoning-Distilled-reap-gguf
at main
🤖
LLM Inference
huggingface.co
·
18h
·
r/LocalLLaMA
Ollama vs vLLM vs
llama.cpp
: Which Wins for Your Use Case
🤖
LLM Inference
tildalice.io
·
5d
Local
LLMs are ready for real work
🤖
LLM Inference
thelurkreport.beehiiv.com
·
2d
·
r/LocalLLaMA
GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU
🤖
LLM Inference
theahmadosman.substack.com
·
8h
·
Substack
,
r/LocalLLaMA
Find bugs in YOUR code using OpenCode,
Llama.cpp
and Qwen3.6
⚙️
Zig
wtarreau.blogspot.com
·
3d
·
Lobsters
,
Hacker News
,
wtarreau.blogspot.com
HF downloader utility tampermonkey
🤖
LLM Inference
greasyfork.org
·
2d
·
r/LocalLLaMA
LM Studio
🤖
LLM Inference
flathub.org
·
6d
I replaced GitHub Copilot with a self-hosted AI and I won’t go back
⚙️
Zig
xda-developers.com
·
10h
What's in a
GGUF
, besides the weights - and what's still missing?
🤖
LLM Inference
nobodywho.ooo
·
6d
·
Hacker News
,
r/LocalLLaMA
Building a Controllable
Inference
Platform on Kubernetes with AI Runway
🤖
LLM Inference
techcommunity.microsoft.com
·
2d
Best
Local
LLMs for Mac in 2026 — M1, M2, M3, M4 Tested
🧠
Memory Allocators
insiderllm.com
·
4d
Ollama Cheat Sheet:
Local
LLMs, Models, API & Integration (2026)
🤖
LLM Inference
meshworld.in
·
2d
·
DEV
Tokenizer Tampering
🤖
LLM Inference
hiddenlayer.com
·
2d
nohurry/gemma-4-26B-A4B-it-heretic-GUFF
🤖
LLM Inference
huggingface.co
·
14h
BrunoArsioli/llama-optimus
: Lightweight Python tool using Optuna for tuning
llama.cpp
flags: towards optimal tok/s for your machine
🧠
Memory Allocators
github.com
·
11h
·
r/LocalLLaMA
Tagging my blog posts with BERTopic and LLMs
🤖
LLM Inference
vickiboykis.com
·
3d
·
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help