Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Prompt optimizations for LLM serving
💬 Prompt optimizations for LLM serving
Specific
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
66
posts in
5.7
ms
How to Measure Time To First Token (
TTFT
) in AI Systems
🔧
Systems-level optimizations for LLM serving
qainsights.com
·
5d
5 days ago
·
Hacker News
Actions for How to Measure Time To First Token (TTFT) in AI Systems
Characterizing Software Aging in GPU-Based
LLM
Serving
Systems
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
19h
19 hours ago
Actions for Characterizing Software Aging in GPU-Based LLM Serving Systems
NetX-lab/Frontier: Frontier: A Discrete-Event Simulator for Modern
LLM
Serving
🔧
Systems-level optimizations for LLM serving
Content type:
Code
github.com
·
16h
16 hours ago
·
Hacker News
Actions for NetX-lab/Frontier: Frontier: A Discrete-Event Simulator for Modern LLM Serving
Less-relevant results
Big Blue’s Redbook on Storage Scale KV
Cache
management
🔧
Systems-level optimizations for LLM serving
Content type:
News
blocksandfiles.com
·
2d
2 days ago
Actions for Big Blue’s Redbook on Storage Scale KV Cache management
How I built a three-tier content quality ladder for programmatic directory ETL
🔧
Systems-level optimizations for LLM serving
platform.claude.com
·
5d
5 days ago
·
DEV
Actions for How I built a three-tier content quality ladder for programmatic directory ETL
Report: GKE
Inference
Gateway delivers up to 92% faster AI responses
🧠
Large Language Models (LLMs)
Content type:
Blog
cloud.google.com
·
2d
2 days ago
·
Hacker News
Actions for Report: GKE Inference Gateway delivers up to 92% faster AI responses
Prompt
Caching
Explained: The AI Concept That Can Save Millions of Tokens
🧠
Large Language Models (LLMs)
Content type:
Blog
sweta-nit.medium.com
·
15h
15 hours ago
Actions for Prompt Caching Explained: The AI Concept That Can Save Millions of Tokens
Claude vs GPT-4: Which AI API Is Better for Developers? (2026)
🧠
Large Language Models (LLMs)
kalyna.pro
·
6d
6 days ago
·
DEV
Actions for Claude vs GPT-4: Which AI API Is Better for Developers? (2026)
"North Mini Code"; open weights, 30B param, Canadian coding model
🤖
Agents using LLMs
Content type:
Blog
cohere.com
·
2d
2 days ago
·
Hacker News
,
Hacker News
Actions for "North Mini Code"; open weights, 30B param, Canadian coding model
How to cut the cost of long AI agent threads (without making the agent dumber)
🤖
Agents using LLMs
Content type:
Blog
viktor.com
·
3d
3 days ago
·
Hacker News
Actions for How to cut the cost of long AI agent threads (without making the agent dumber)
Intelligent
inference
scheduling with
llm-d
on Red Hat AI
🔧
Systems-level optimizations for LLM serving
developers.redhat.com
·
23h
23 hours ago
Actions for Intelligent inference scheduling with llm-d on Red Hat AI
Deep Dive into
LLM
Token Cost — Blog Series Part 2: How
Prompt
Caching
Actually Works
🧠
Large Language Models (LLMs)
Content type:
Blog
weidongzhou.wordpress.com
·
5d
5 days ago
·
Hacker News
Actions for Deep Dive into LLM Token Cost — Blog Series Part 2: How Prompt Caching Actually Works
What Breaks When Multi-Agent Systems Scale
🤖
Agents using LLMs
digitalocean.com
·
1d
1 day ago
Actions for What Breaks When Multi-Agent Systems Scale
How Ecolab rebuilt retail intelligence on Databricks and Anthropic Claude
🔍
Retrieval-augmented generation
Content type:
Blog
databricks.com
·
8h
8 hours ago
Actions for How Ecolab rebuilt retail intelligence on Databricks and Anthropic Claude
From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure
📊
AI Performance Profiling
Content type:
Blog
jimmysong.io
·
2d
2 days ago
Actions for From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure
1-bit and 1.58 bit
LLM
Benchmarking on Jetson Orin Nano Super | Bonsai LM
🧠
Large Language Models (LLMs)
smolhub.com
·
3d
3 days ago
·
r/LocalLLaMA
Actions for 1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
SPEAR: A System for Post-Quantization Error-Adaptive Recovery Enabling Efficient Low-Bit
LLM
Serving
✨
Model optimizations in LLMs
Content type:
Academic
arxiv.org
·
19h
19 hours ago
Actions for SPEAR: A System for Post-Quantization Error-Adaptive Recovery Enabling Efficient Low-Bit LLM Serving
harshuljain13/llm-inference-at-scale
: A Practitioner handbook for production
llm
serving
.
🔧
Systems-level optimizations for LLM serving
Content type:
Code
github.com
·
5d
5 days ago
·
Hacker News
,
r/LLM
Actions for harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.
Announcing the Path to Production for Agents Webinar Series
🤖
Agents using LLMs
techcommunity.microsoft.com
·
2d
2 days ago
Actions for Announcing the Path to Production for Agents Webinar Series
Claude Opus is more performant on OpenCode than Claude Code
📊
AI Performance Profiling
Content type:
Discussion
artificialanalysis.ai
·
11h
11 hours ago
·
Hacker News
Actions for Claude Opus is more performant on OpenCode than Claude Code
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help