🚀 LLM serving frameworks - pleto · Scour

Self-hosted remote access for Ollama without complicated setup

🔧Systems-level optimizations for LLM serving

oab.arc-i.co.uk··r/selfhosted

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

✨Model optimizations in LLMs News Blog

kaitchup.substack.com··r/LocalLLaMA

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

🔧Systems-level optimizations for LLM serving Academic

Tales of an Ollama Honeypot (Part 3): More Traffic, More Findings

🔧Systems-level optimizations for LLM serving

posts.inthecyber.com·

The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure

🔍Retrieval-augmented generation

Less-relevant results

Google's new open model DiffusionGemma generates text from noise instead of word by word

🧠Large Language Models (LLMs)

the-decoder.com

·

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

🔧Systems-level optimizations for LLM serving

huggingface.co··Hacker News

fix(agents): project thinking catalog compat · openclaw/openclaw@68ec783

🤖Agents using LLMs Code

For whom the door-bell tolls

🧠Large Language Models (LLMs)

"AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY

🧠Large Language Models (LLMs) News Blog

braddelong.substack.com··Substack

What's in the Box? A Field Guide to AI Models

🧠Large Language Models (LLMs) Blog

iankduncan.com·

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

🧠Large Language Models (LLMs) News

·

Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss

✨Model optimizations in LLMs News

I Processed 2.4 Billion Tokens Across 52 AI Models for $0.52. Here's the Full Breakdown.

🤖Agents using LLMs

saintlex.sbs··DEV

RakuOS fixes the one thing that annoys me most about immutable Linux distros

🔧Systems-level optimizations for LLM serving News

Latest technical articles & videos.

🌐Distributed LLM Systems

certdepot.net·

Creating ADK Agent using locally running Gemma 4

✨Model optimizations in LLMs Blog

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

🔧Systems-level optimizations for LLM serving Academic

[AINews] Open Models, Model Labs vs Agent Labs, and What's Untrainable — Sarah Guo

🧠Large Language Models (LLMs) News

·

How to Measure Time To First Token (TTFT) in AI Systems

💬Prompt optimizations for LLM serving

qainsights.com··Hacker News

Sign up or log in to see more results

Log in to enable infinite scrolling