Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🤖 LLM Inference
Specific
Model Serving, Quantization, vLLM, ONNX Runtime
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
358
posts in
15.0
ms
Blazing fast on-device GenAI with LiteRT-LM
🦙
llama.cpp
developers.googleblog.com
·
1d
Gemini’s AI Comeback, TPU Wars, & Karpathy Returns
🤖
AI
briefing.forwardfuture.ai
·
18h
Meta's WhatsApp Incognito Chat puts AI conversations in a black box
🦙
llama.cpp
ppc.land
·
3d
LLM
Observability with Self-Hosted Langfuse and
vLLM
🦙
llama.cpp
pyimagesearch.com
·
2d
QClaw: A Fully Local Agentic Assistant on the Arduino Uno Q
🦙
llama.cpp
hackster.io
·
23h
ImpactArbiter – A PyTorch autograd trap for
LLM
memory bugs
🦙
llama.cpp
github.com
·
2d
·
Hacker News
Ollama Doesn't Know Its GPU Is on Another Machine
🦙
llama.cpp
loopholelabs.io
·
15h
·
Hacker News
A cheap fix that saves the AI $400M dollars a year and brings 4B people online
⚙️
Zig
codecai.net
·
3d
·
Hacker News
Cerebras Brings Kimi K2.6
Inference
to Enterprises
🤖
AI
cerebras.ai
·
1d
·
Hacker News
Four-Tier Memory Hierarchy for
LLM
Reasoning (USC, UW)
🦙
llama.cpp
semiengineering.com
·
11h
DeepSeek Agent Harness: Technical deep-dive & the open-source blueprint
💾
SQLite
dlcmh.github.io
·
3h
·
Hacker News
ROCm 7 on Strix Halo: Benchmarking the New Toolbox Images
🦙
llama.cpp
sleepingrobots.com
·
4d
Announcing OpenAI-compatible API support for Amazon SageMaker AI endpoints
🤖
AI
aws.amazon.com
·
5h
I replaced GitHub Copilot with a self-hosted AI and I won’t go back
⚙️
Zig
xda-developers.com
·
10h
Qwen3.6-27B-UD-Q4_K_
XL.gguf
·
unsloth/Qwen3.6-27B-MTP-GGUF
at main
🦙
llama.cpp
huggingface.co
·
3d
·
r/LocalLLaMA
SpecSA: Bridging
Speculative
Decoding
and Sparse Attention for Efficient
LLM
Inference
🦙
llama.cpp
arxiv.org
·
1d
AI
runs
on
tokens
. There’s a missing artifact between them.
🤖
AI
medium.com
·
2d
Towards local plug-and-play AI
🦙
llama.cpp
adlrocha.substack.com
·
3d
·
Substack
Initial Benchmarks Of The SpacemiT K3 RVA23 RISC-V CPU With The K3 Pico-ITX
🧠
Memory Allocators
phoronix.com
·
13h
·
Hacker News
HF downloader utility tampermonkey
🦙
llama.cpp
greasyfork.org
·
2d
·
r/LocalLLaMA
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help