Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LocalLlama
reddit.com
I got 3× faster
HFQ4
prefill on Strix Halo in
hipfire
with an opt-in MMQ path
github.com
·
1w
·
r/LocalLLaMA
heardlabs/heard
: A voice companion for AI coding agents. Speaks your agent's
replies
so you can keep working.
github.com
·
1w
·
r/LocalLLaMA
Introducing
talkie
: a
13B
vintage language model from 1930
talkie-lm.com
·
1w
·
Lobsters
,
Hacker News
,
r/LLM
,
r/LocalLLaMA
,
r/singularity
Qwen 3.6-35B-A3B KV cache bench:
f16
vs q8_0 vs
turbo3
vs turbo4 from 0 to 1M context on M5 Max
llmkube.com
·
1w
·
r/LocalLLaMA
End-2-end
tutorial
on
fine-tuning
, the whole journey
docs.liquid.ai
·
1w
·
r/LocalLLaMA
XiaomiMiMo/MiMo-V2.5-Pro
huggingface.co
·
1w
·
Hacker News
,
r/LocalLLaMA
Simple to use vLLM Docker Container for Qwen3.6 27b with
Lorbus
AutoRound
INT4 quant and MTP speculative decoding - 118 tokens/second on 2x 3090s
github.com
·
1w
·
r/LocalLLaMA
Skymizer
Taiwan Inc. Unveils Breakthrough Architecture
Enabling
Ultra-Large LLM Inference on a Single Card
en.prnasia.com
·
1w
·
r/LocalLLaMA
aiptimizer/TurboOCR
: Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.
github.com
·
3w
·
Hacker News
,
Hacker News
,
Hacker News
,
r/LocalLLaMA
How to run a local coding agent with
Gemma
4 and
Pi
patloeber.com
·
1w
·
Hacker News
,
r/LocalLLaMA
shreyansh26/Speculative-Decoding
: Speculative Decoding Implementations: EAGLE-3, Medusa-1,
PARD
, Draft Models, N-gram and Suffix Decoding from scratch
github.com
·
1w
·
r/LLM
,
r/LocalLLaMA
Why
SWE-bench
Verified no longer measures frontier coding capabilities
openai.com
·
1w
·
Hacker News
,
r/LocalLLaMA
waybarrios/opencode-power-pack
: Eleven Claude Code skills
ported
to OpenCode: code-review, security-review, feature-dev, frontend-design + 7 more. One config line, one plugin.
github.com
·
1w
·
Hacker News
,
r/LLM
,
r/LocalLLaMA
Hash
anchors
and Myers diff and single-token
anchors
: 60% cheaper AI code
edits
dirac.run
·
1w
·
Hacker News
,
r/LocalLLaMA
Qwen3.6
35B
A3B Heretic (
KLD
0.0015!) Incredible model. Best
35B
I have found!
huggingface.co
·
1w
·
r/LocalLLaMA
Qwen3.6-27B-INT4
clocking 100 tps with
256k
context length on 1x RTX 5090 via vllm 0.19
huggingface.co
·
1w
·
r/LocalLLaMA
DeepSeek
_
V4.pdf
·
deepseek-ai/DeepSeek-V4-Pro
at main
huggingface.co
·
1w
·
Hacker News
,
r/LocalLLaMA
Pi.dev
: There are many coding agents, but this one is
mine
pi.dev
·
11w
·
Hacker News
,
Hacker News
,
r/LocalLLaMA
🛡️ Shield
82M
: A PII
stripping/filtering
model 🛡️
huggingface.co
·
1w
·
r/LocalLLaMA
Qwen3.6-27B
at ~80 tps with
218k
context window on 1x RTX 5090 served by vllm 0.19
huggingface.co
·
1w
·
r/LocalLLaMA
« Page 3
·
Page 5 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help