Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
💾 Prompt Caching
Context Reuse, KV Cache, Inference Optimization, Token Efficiency
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
20768
posts in
429.7
ms
1M
token context: The good, the bad and the
ugly
(2025)
micron.com
·
2d
·
Discuss:
Hacker News
🏗️
LLM Infrastructure
SalesforceAIResearch/promptomatix
: An Automatic Prompt Optimization Framework for Large Language Models
github.com
·
2d
🪄
Prompt Engineering
Technical
writeup
: Implementing Discord’s rate
limiting
, gateway management, and “clarity over magic”
scurry-works.github.io
·
6h
·
Discuss:
r/programming
🌐
ActivityPub Protocol
Creeping
memory
allocation
community.folivora.ai
·
13h
🧠
Memory Allocation Strategies
ProphetKV
: User-Query-Driven Selective
Recomputation
for Efficient KV Cache Reuse in Retrieval-Augmented Generation
arxiv.org
·
4d
⚡
Vectorized Execution
Achieving
Ultra-Fast AI Chat
Widgets
cjroth.com
·
22h
·
Discuss:
Hacker News
🪄
Prompt Engineering
🔗
Processing
11 million
rows
in minutes instead of hours
yellowduck.be
·
11h
⚡
Vectorized Execution
Build an AI RAG
Chatbot
with
n8n
, Google Drive & Gemini
lumberjack.so
·
12h
✨
Gemini
What I
wish
I
knew
before building a vibe coding platform
imagine.dev
·
2d
·
Discuss:
Hacker News
🔄
Incremental Computation
How to Reduce
Telemetry
Volume by 40%
Smartly
newsletter.signoz.io
·
11h
·
Discuss:
r/programming
📡
Network Latency
When Language Models Get Stuck: The Mechanics of
Repetition
Loops
pub.towardsai.net
·
1d
🔤
Tokenization
Performance
Tip
of the Week #79: Make at most one
tradeoff
at a time
abseil.io
·
22h
⚙️
Mechanical Sympathy
Show HN:
LocalGPT
– A local-first AI assistant in Rust with
persistent
memory
news.ycombinator.com
·
16h
·
Discuss:
Hacker News
🔎
Tantivy
Introduction
to
Flakes
nixos-and-flakes.thiscute.world
·
12h
🔍
Quickwit
Speeding
Up
HTML
Generation by 2000%
bobrubbens.nl
·
2d
🛠️
Build Optimization
Show HN: A
Prompting
Framework for
Non-Vibe-Coders
github.com
·
13h
·
Discuss:
Hacker News
🪄
Prompt Engineering
How we cut
Vertex
AI latency by 35% with
GKE
Inference Gateway
cloud.google.com
·
2d
🧠
Inference Serving
2026 Week 6
Digest
anarchaeopteryx.bearblog.dev
·
2h
💻
Chips
Unlocking core memories with
GoldSrc
engine and
CS
1.6 (2025)
danielbrendel.com
·
8h
·
Discuss:
Hacker News
🏹
Apache Arrow
25W06
. Learning a language with the machine
z1nz0l1n.com
·
9h
🔤
Tokenization
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help