Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Prompt optimizations for LLM serving
💬 Prompt optimizations for LLM serving
Specific
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
65
posts in
7.6
ms
CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?
⚙️
AI Infrastructure Automation
uccl-project.github.io
·
18h
18 hours ago
·
Hacker News
Actions for CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?
What Arm-based innovations happened in May 2026?
🔧
Systems-level optimizations for LLM serving
Content type:
Blog
newsroom.arm.com
·
6d
6 days ago
Actions for What Arm-based innovations happened in May 2026?
SIFT: Selective-Index For Fast Compute of RAG Prefill by Exploiting Attention Invariance
🔍
Retrieval-augmented generation
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for SIFT: Selective-Index For Fast Compute of RAG Prefill by Exploiting Attention Invariance
MLPerf and the rise of
latency-aware
LLM
benchmarking
🧠
Large Language Models (LLMs)
edn.com
·
6d
6 days ago
Actions for MLPerf and the rise of latency-aware LLM benchmarking
For whom the door-bell tolls
🧠
Large Language Models (LLMs)
ceph.io
·
1d
1 day ago
Actions for For whom the door-bell tolls
Claude Fable 5 and Mythos 5 pricing: Anthropic's new $10/$50 top tier
🧠
Large Language Models (LLMs)
aipricing.guru
·
2d
2 days ago
·
Hacker News
Actions for Claude Fable 5 and Mythos 5 pricing: Anthropic's new $10/$50 top tier
"North Mini Code"; open weights, 30B param, Canadian coding model
🤖
Agents using LLMs
Content type:
Blog
cohere.com
·
3d
3 days ago
·
Hacker News
,
Hacker News
Actions for "North Mini Code"; open weights, 30B param, Canadian coding model
The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure
🔍
Retrieval-augmented generation
devops.com
·
6d
6 days ago
Actions for The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure
Comparing Claude Fable 5's system
prompt
to Opus 4.8
🧠
Large Language Models (LLMs)
Content type:
Blog
twelvetables.blog
·
2d
2 days ago
·
Hacker News
Actions for Comparing Claude Fable 5's system prompt to Opus 4.8
Semantic
Cache
Distillation: Efficient State Transfer via Reuse and Selective Patching
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Semantic Cache Distillation: Efficient State Transfer via Reuse and Selective Patching
Claude Mythos 5 / Fable 5
📊
AI Performance Profiling
Content type:
Discussion
anthropic.com
·
2d
2 days ago
·
Hacker News
Actions for Claude Mythos 5 / Fable 5
Architecting the Control Plane for Intelligence: System Design of an Enterprise AI Gateway
🤖
Agents using LLMs
Content type:
Blog
medium.com
·
3d
3 days ago
Actions for Architecting the Control Plane for Intelligence: System Design of an Enterprise AI Gateway
Build a local voice agent with Red Hat OpenShift AI
🧠
Large Language Models (LLMs)
developers.redhat.com
·
4d
4 days ago
Actions for Build a local voice agent with Red Hat OpenShift AI
Why It’s So Hard for Older B2B Leaders to Compete in AI: Your Customers Can Do A Lot in Claude for $20-$200/Month. And You’re Paying $1.00 Per API Call For the Good Stuff.
⚙️
AI Infrastructure Automation
saastr.com
·
2d
2 days ago
Actions for Why It’s So Hard for Older B2B Leaders to Compete in AI: Your Customers Can Do A Lot in Claude for $20-$200/Month. And You’re Paying $1.00 Per API Call For the Good Stuff.
The all-you-can-eat AI era is over. It's time to count calories.
🤖
Agents using LLMs
Content type:
News
businessinsider.com
·
1d
1 day ago
Actions for The all-you-can-eat AI era is over. It's time to count calories.
What Should a Skill Remember? Quality-Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents
🧠
Large Language Models (LLMs)
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for What Should a Skill Remember? Quality-Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents
How we fight GPU scarcity without compromise
🧠
Large Language Models (LLMs)
Content type:
Blog
equixly.com
·
6d
6 days ago
·
Hacker News
Actions for How we fight GPU scarcity without compromise
Issue #390 - The ML
Engineer
🤖
✨
Model optimizations in LLMs
Content type:
News
Content type:
Blog
machinelearning.substack.com
·
4d
4 days ago
·
Substack
Actions for Issue #390 - The ML Engineer 🤖
TjWheeler/deep-memory: A GraphRAG implementation with a Vocabulary system to
optimise
AI integration
🤖
Agents using LLMs
Content type:
Code
github.com
·
2d
2 days ago
·
Hacker News
Actions for TjWheeler/deep-memory: A GraphRAG implementation with a Vocabulary system to optimise AI integration
Fairness-Aware and
Latency-Controllable
Scheduling for Chunked-Prefill
LLM
Serving
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Fairness-Aware and Latency-Controllable Scheduling for Chunked-Prefill LLM Serving
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help