LLM serving frameworks

Feeds to Scour
SubscribedAll
Scoured 236 posts in 7.0 ms

Self-hosted remote access for Ollama without complicated setup

 🔧Systems-level optimizations for LLM serving

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

 Model optimizations in LLMs  Content type: News  Content type: Blog

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

 🔧Systems-level optimizations for LLM serving  Content type: Academic
arxiv.org·

Tales of an Ollama Honeypot (Part 3): More Traffic, More Findings

 🔧Systems-level optimizations for LLM serving
posts.inthecyber.com·

The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure

 🔍Retrieval-augmented generation
devops.com·
Less-relevant results

Google's new open model DiffusionGemma generates text from noise instead of word by word

 🧠Large Language Models (LLMs)
the-decoder.com
·

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

 🔧Systems-level optimizations for LLM serving

fix(agents): project thinking catalog compat · openclaw/openclaw@68ec783

 🤖Agents using LLMs  Content type: Code
github.com·

For whom the door-bell tolls

 🧠Large Language Models (LLMs)
ceph.io·

"AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY

 🧠Large Language Models (LLMs)  Content type: News  Content type: Blog

What's in the Box? A Field Guide to AI Models

 🧠Large Language Models (LLMs)  Content type: Blog
iankduncan.com·

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

 🧠Large Language Models (LLMs)  Content type: News
latent.space
·

Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss

 Model optimizations in LLMs  Content type: News
digg.com·

I Processed 2.4 Billion Tokens Across 52 AI Models for $0.52. Here's the Full Breakdown.

 🤖Agents using LLMs
saintlex.sbs··DEV

RakuOS fixes the one thing that annoys me most about immutable Linux distros

 🔧Systems-level optimizations for LLM serving  Content type: News
zdnet.com·

Latest technical articles & videos.

 🌐Distributed LLM Systems
certdepot.net·

Creating ADK Agent using locally running Gemma 4

 Model optimizations in LLMs  Content type: Blog
medium.com·

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

 🔧Systems-level optimizations for LLM serving  Content type: Academic
arxiv.org·

[AINews] Open Models, Model Labs vs Agent Labs, and What's Untrainable — Sarah Guo

 🧠Large Language Models (LLMs)  Content type: News
latent.space
·

How to Measure Time To First Token (TTFT) in AI Systems

 💬Prompt optimizations for LLM serving
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help