Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LocalLlama
reddit.com
has anyone tried this?
Flash-MoE
: Running a
397B
Parameter Model on a Laptop
github.com
·
6w
·
r/LocalLLaMA
ian-hailey/vllm-docker-Qwen3-5-122B-A10B-NVFP4
: Docker container config for launching
Qwen3.5-122B-A10B-NVFP4
with vLLM
github.com
·
6w
·
r/LocalLLaMA
schutzpunkt/strix-halo-ai-stack
: Ansible playbook to configure AMD Strix Halo machines (e.g. Framework Desktop or
GMKtec
EVO-X2) as local AI inference servers running Fedora 43. Sets up llama.cpp with llama-swap and Open WebUI and downloads GGUF models. With NGINX reverse proxy and TLS via ACME or self-signed certificate.
github.com
·
6w
·
r/LocalLLaMA
woct0rdho/ComfyUI-FeatherOps
: Fast fp16-fp8 mixed precision matmul on RDNA3/3.5 GPUs without native fp8
github.com
·
6w
·
r/LocalLLaMA
,
r/StableDiffusion
reverse/autoresearch
: AI agents running research on single-GPU
nanochat
training automatically
github.com
·
6w
·
r/LocalLLaMA
ik
_llama.cpp gives 26x faster prompt processing on Qwen 3.5
27B
github.com
·
6w
·
r/LocalLLaMA
Show HN: AI agents go on blind
dates
and leave each other
voicemails
lobsterdate.com
·
6w
·
Hacker News
,
r/LocalLLaMA
The Reasoning
Bottleneck
in Graph-RAG: Structured Prompting and Context Compression for Multi-Hop
QA
arxiv.org
·
7w
·
r/LocalLLaMA
ikawrakow/ik
_
llama.cpp
github.com
·
41w
·
Hacker News
,
r/LocalLLaMA
,
r/LocalLLaMA
Don't sleep on the new
Nemotron
Cascade
huggingface.co
·
6w
·
r/LocalLLaMA
TGI
is in
maintenance
mode. Time to switch?
huggingface.co
·
6w
·
r/LocalLLaMA
feat: native
MTP
speculative decoding for Qwen3.5 by
AirRunner
· Pull Request #990
github.com
·
6w
·
r/LocalLLaMA
Eamon2009/Transformer-language-model
: An educational implementation of a GPT-style language model built from scratch using PyTorch to understand how transformer-based AI models work. No pre-trained
weights
. No fine-tuning,can be trained on $300 laptop
github.com
·
6w
·
r/LocalLLaMA
[Bug]: The hit rate of
prefix
caching in Qwen3.5
35BA3B
is very low, always less than 0.1% · Issue #36493
github.com
·
6w
·
r/LocalLLaMA
vasilyevdm/ai-agent-handbook
: Comprehensive guide to AI agent engineering: how 30+ frameworks actually work under the hood. Context rot,
compaction
, system prompt assembly, SOUL.md, agent loops, memory systems, tool sprawl, MCP, progressive disclosure, multi-agent orchestration, Plan/Act, episodic memory. Code examples throughout. Pick the right stack, avoid the common traps
github.com
·
6w
·
r/LocalLLaMA
My
gripe
with Qwen3.5
35B
and my first fine tune fix
huggingface.co
·
6w
·
r/LocalLLaMA
LongCat-Flash-Prover
: A new frontier for Open-Source Formal Reasoning.
huggingface.co
·
6w
·
r/LocalLLaMA
Kimi just published a paper replacing
residual
connections in transformers. results look
legit
github.com
·
6w
·
Hacker News
,
r/LocalLLaMA
Add Qwen3
TTS
architecture support by
Acceldium
· Pull Request #20752
github.com
·
6w
·
r/LocalLLaMA
rednote-hilab/dots.mocr
huggingface.co
·
6w
·
r/LocalLLaMA
« Page 17
·
Page 19 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help