Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM serving frameworks
🚀 LLM serving frameworks
Specific
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
236
posts in
7.0
ms
Self-hosted remote access for
Ollama
without complicated setup
🔧
Systems-level optimizations for LLM serving
oab.arc-i.co.uk
·
4d
4 days ago
·
r/selfhosted
Actions for Self-hosted remote access for Ollama without complicated setup
MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
✨
Model optimizations in LLMs
Content type:
News
Content type:
Blog
kaitchup.substack.com
·
6d
6 days ago
·
r/LocalLLaMA
Actions for MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
RKSC: Reasoning-Aware
KV
Cache
Sharing and Confident Early Exit for Multi-Step
LLM
Inference
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference
Tales of an
Ollama
Honeypot (Part 3): More Traffic, More Findings
🔧
Systems-level optimizations for LLM serving
posts.inthecyber.com
·
3d
3 days ago
Actions for Tales of an Ollama Honeypot (Part 3): More Traffic, More Findings
The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure
🔍
Retrieval-augmented generation
devops.com
·
6d
6 days ago
Actions for The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure
Less-relevant results
Google's new open model DiffusionGemma
generates
text
from noise instead of word by word
🧠
Large Language Models (LLMs)
the-decoder.com
·
1d
1 day ago
Actions for Google's new open model DiffusionGemma generates text from noise instead of word by word
NexusOS v2.0 – A zero-dependency pipeline streaming
server
chaos to Parquet
🔧
Systems-level optimizations for LLM serving
huggingface.co
·
3d
3 days ago
·
Hacker News
Actions for NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet
fix(agents): project thinking catalog compat · openclaw/openclaw@68ec783
🤖
Agents using LLMs
Content type:
Code
github.com
·
12h
12 hours ago
Actions for fix(agents): project thinking catalog compat · openclaw/openclaw@68ec783
For whom the door-bell tolls
🧠
Large Language Models (LLMs)
ceph.io
·
1d
1 day ago
Actions for For whom the door-bell tolls
"AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY
🧠
Large Language Models (LLMs)
Content type:
News
Content type:
Blog
braddelong.substack.com
·
3d
3 days ago
·
Substack
Actions for "AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY
What's in the Box? A Field Guide to AI Models
🧠
Large Language Models (LLMs)
Content type:
Blog
iankduncan.com
·
3d
3 days ago
Actions for What's in the Box? A Field Guide to AI Models
[AINews] FrontierCode: Benchmarking for Code Quality over Slop
🧠
Large Language Models (LLMs)
Content type:
News
latent.space
·
2d
2 days ago
Actions for [AINews] FrontierCode: Benchmarking for Code Quality over Slop
Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive
llama.cpp
conversions suffer accuracy loss
✨
Model optimizations in LLMs
Content type:
News
digg.com
·
6d
6 days ago
Actions for Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss
I Processed 2.4 Billion Tokens Across 52 AI Models for $0.52. Here's the Full Breakdown.
🤖
Agents using LLMs
saintlex.sbs
·
21h
21 hours ago
·
DEV
Actions for I Processed 2.4 Billion Tokens Across 52 AI Models for $0.52. Here's the Full Breakdown.
RakuOS fixes the one thing that annoys me most about immutable Linux distros
🔧
Systems-level optimizations for LLM serving
Content type:
News
zdnet.com
·
2d
2 days ago
Actions for RakuOS fixes the one thing that annoys me most about immutable Linux distros
Latest technical articles & videos.
🌐
Distributed LLM Systems
certdepot.net
·
5d
5 days ago
Actions for Latest technical articles & videos.
Creating ADK Agent using locally running Gemma 4
✨
Model optimizations in LLMs
Content type:
Blog
medium.com
·
4d
4 days ago
Actions for Creating ADK Agent using locally running Gemma 4
Alignment Collapse Under
KV
Cache
Quantization
: Diagnosis and Mitigation
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation
[AINews] Open Models, Model Labs vs Agent Labs, and What's Untrainable — Sarah Guo
🧠
Large Language Models (LLMs)
Content type:
News
latent.space
·
21h
21 hours ago
Actions for [AINews] Open Models, Model Labs vs Agent Labs, and What's Untrainable — Sarah Guo
How to Measure Time To First Token (TTFT) in AI Systems
💬
Prompt optimizations for LLM serving
qainsights.com
·
5d
5 days ago
·
Hacker News
Actions for How to Measure Time To First Token (TTFT) in AI Systems
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help