Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LocalLlama
reddit.com
Achilles1089/duplex-chat
: AI that thinks while you type. Speculative inference protocol that eliminates perceived latency in AI chat.
github.com
·
5w
·
r/LocalLLaMA
zolotukhin/zinc
: Zig INferenCe Engine — LLM inference for AMD
RDNA3/RDNA4
GPUs via Vulkan
github.com
·
5w
·
Hacker News
,
r/LocalLLaMA
,
r/Zig
Inference speed
comparisons
between M1 Pro and
maxed-out
M4 Max
github.com
·
61w
·
r/LocalLLaMA
,
r/LocalLLaMA
yashkc2025/turboquant
: Python implementation of
TurboQuant
(arXiv 2504.19874). Data-oblivious, near-optimal 1–4 bit vector quantization for streaming KV-caches and databases.
github.com
·
5w
·
r/LocalLLaMA
mahmoudsamy7729/agentic-rag
: A clean, modular implementation of an Agentic RAG (
Retrieval-Augmented
Generation) system built with a production-ready architecture.
github.com
·
5w
·
r/LocalLLaMA
Nvidia
Kimodo
: kinematic motion diffusion model trained on
mocap
data
research.nvidia.com
·
7w
·
Hacker News
,
r/LocalLLaMA
Día
27 de construir un laboratorio de IA
autónomo
con capital real.
descubriendoloesencial.substack.com
·
5w
·
r/LocalLLaMA
,
r/SideProject
Inference
Engines
— A visual deep dive into the journey of a token down the transformer
layers
femiadeniran.com
·
5w
·
r/LocalLLaMA
Al0olo/voxtral-voice-clone
: Training the missing codec encoder for Mistral's
Voxtral-4B-TTS
, enabling zero-shot voice cloning
github.com
·
5w
·
r/LocalLLaMA
Running
Qwen
3.5 (
122B
) with ~72GB of VRAM
huggingface.co
·
9w
·
r/LocalLLaMA
,
r/LocalLLaMA
Agentic OS that
replaces
OpenClaw
github.com
·
5w
·
r/LocalLLaMA
nicedreamzapp/claude-code-local
: Run Claude Code with local AI on Apple Silicon.
122B
model at 41 tok/s with Google TurboQuant. No cloud, no API fees.
github.com
·
5w
·
r/ClaudeAI
,
r/LocalLLaMA
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF
huggingface.co
·
5w
·
r/LocalLLaMA
Breaking change in
llama-server
?
github.com
·
5w
·
r/LocalLLaMA
Llama.cpp with
Turboquant
, Heavy-Hitter Oracle (H2O), and
StreamingLLM
. Even more performance!
github.com
·
5w
·
r/LocalLLaMA
Lowkey-Loki-SN/noflash-attention
: Flash-attention-class memory efficiency for GPUs without flash attention
github.com
·
5w
·
r/LocalLLaMA
ggml
: allow prefetching tensor overrides by
am17an
· Pull Request #21067
github.com
·
5w
·
r/LocalLLaMA
I built a fully local
GraphRAG
pipeline (0 GPUs needed) using Llama 3.1,
Neo4j
, and LangChain. Code included!
github.com
·
5w
·
r/LocalLLaMA
,
r/vibecoding
RED-BASE/SpruceChat
: A tiny AI that lives inside your
handheld
. Local LLM chat on spruceOS.
github.com
·
5w
·
r/LocalLLaMA
Arc
AGI
3
arcprize.org
·
41w
·
Hacker News
,
Hacker News
,
r/LocalLLaMA
« Page 14
·
Page 16 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help