Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Prompt optimizations for LLM serving
💬 Prompt optimizations for LLM serving
Specific
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
65
posts in
8.8
ms
Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent
🧠
Large Language Models (LLMs)
Content type:
Blog
dnhkng.github.io
·
3d
3 days ago
Actions for Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent
Enabling KV
Caching
of Shared Prefix for Diffusion Language Models
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Enabling KV Caching of Shared Prefix for Diffusion Language Models
Show HN: Kikubot – Each AI agent is an inbox
🤖
Agents using LLMs
Content type:
Code
github.com
·
10h
10 hours ago
·
Hacker News
Actions for Show HN: Kikubot – Each AI agent is an inbox
Achieving Cloud-Grade SLOs for Local Mixture-of-Experts
Inference
through CPU-GPU Hybrid Design
⚙️
AI Infrastructure Automation
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design
RedKnot: Efficient Long-Context
LLM
Serving
with Head-Aware KV Reuse and SegPagedAttention
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for RedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttention
SpectrumKV: Per-Token Mixed-Precision KV
Cache
Transfer for Prefill-Decode Disaggregated
LLM
Serving
🧠
Large Language Models (LLMs)
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving
aussiealex/agentmeter: Know what your agents cost. Cost intelligence for AI coding agents.
🤖
Agents using LLMs
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for aussiealex/agentmeter: Know what your agents cost. Cost intelligence for AI coding agents.
Tangram: Unlocking Non-Uniform KV
Cache
for Efficient Multi-turn
LLM
Serving
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
6d
6 days ago
·
Hacker News
Actions for Tangram: Unlocking Non-Uniform KV Cache for Efficient Multi-turn LLM Serving
Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script
🔧
Systems-level optimizations for LLM serving
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script
OpenPCC: Open and Confidential
LLM
Serving
on Commodity TEEs
🤖
Agents using LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for OpenPCC: Open and Confidential LLM Serving on Commodity TEEs
TjWheeler/deep-memory: A GraphRAG implementation with a Vocabulary system to
optimise
AI integration
🤖
Agents using LLMs
Content type:
Code
github.com
·
3d
3 days ago
·
Hacker News
Actions for TjWheeler/deep-memory: A GraphRAG implementation with a Vocabulary system to optimise AI integration
Agents All the Way Down; A Methodology for Building Custom AI Agents from Substrate to Production
🤖
Agents using LLMs
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Agents All the Way Down; A Methodology for Building Custom AI Agents from Substrate to Production
Week 1 of building Quantamind: Ditching Electron for Rust & Tauri 🦀
🚀
LLM serving frameworks
Content type:
Code
github.com
·
6d
6 days ago
·
DEV
Actions for Week 1 of building Quantamind: Ditching Electron for Rust & Tauri 🦀
How Much Dense Attention is Necessary? Oracle-Guided Sparse Prefill for Full/GQA Layers in Hybrid Long-Context Models
🧠
Large Language Models (LLMs)
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for How Much Dense Attention is Necessary? Oracle-Guided Sparse Prefill for Full/GQA Layers in Hybrid Long-Context Models
hansstam86/wibeos
🚀
LLM serving frameworks
Content type:
Code
github.com
·
5d
5 days ago
·
Hacker News
Actions for hansstam86/wibeos
Vortex: Efficient and Programmable Sparse Attention
Serving
for AI Agents
🤖
Agents using LLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents
kenn-io/agentsview: Local-first session intelligence and analytics for coding agents, supporting Claude Code, Codex, and more than 20 other agents. Also: 100x faster replacement for ccusage!
🤖
Agents using LLMs
Content type:
Code
github.com
·
1d
1 day ago
Actions for kenn-io/agentsview: Local-first session intelligence and analytics for coding agents, supporting Claude Code, Codex, and more than 20 other agents. Also: 100x faster replacement for ccusage!
AGENTSERVESIM: A Hardware-aware Simulator for Multi-Turn
LLM
Agent
Serving
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for AGENTSERVESIM: A Hardware-aware Simulator for Multi-Turn LLM Agent Serving
tigerless-labs/cost-xray: See what Claude Code and Codex actually send to the API — and what each part costs.
🧠
Large Language Models (LLMs)
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for tigerless-labs/cost-xray: See what Claude Code and Codex actually send to the API — and what each part costs.
Video-Rate Streaming Stylization on a Vision-Aware MLLM-Conditioned Edit Diffusion: Asymmetric Batched
Inference
on a Distilled UNet + MLLM Text Encoder
⚡
Real-time AI Systems
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Video-Rate Streaming Stylization on a Vision-Aware MLLM-Conditioned Edit Diffusion: Asymmetric Batched Inference on a Distilled UNet + MLLM Text Encoder
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help