Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🧠 LLM Inference
Quantization, Attention Mechanisms, Batch Processing, KV Caching
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
26922
posts in
972.8
ms
How we cut
Vertex
AI latency by 35% with
GKE
Inference Gateway
cloud.google.com
·
4d
🧠
Inference Serving
The LLM Judge
Controversy
mlfrontiers.substack.com
·
2d
·
Discuss:
Substack
🏆
LLM Benchmarking
The
Ur-model
Cometh
kill-the-newsletter.com
·
2d
🆕
New AI
Alibaba Pushes Into
Robotics
AI With Open-Source ‘
RynnBrain
’
bloomberg.com
·
22h
🆕
New AI
bartowski/moonshotai
_Kimi-Linear-48B-A3B-Instruct-GGUF
huggingface.co
·
1d
·
Discuss:
r/LocalLLaMA
🚀
Astral
Securing
GenAI
: Vol 4 — Fundamentals of AI model security
pub.towardsai.net
·
1d
🛡️
AI Security
Drifting
models
breno.bearblog.dev
·
1d
📊
Model Serving Economics
Grow with the Flow:
4D
Reconstruction of Growing Plants
Gaussian
Flow Fields
weihanluo.ca
·
20h
📦
Batch Embeddings
Gemini
thinking
| Gemini API | Google AI for
Developers
ai.google.dev
·
1d
✨
Gemini
(6) How
foreground-swapping
can improve LLM response
usability
mikecaulfield.substack.com
·
3d
·
Discuss:
Substack
🪄
Prompt Engineering
*There is no
conceptual
or practical path from what you
describe
to what modern ...
news.ycombinator.com
·
1d
·
Discuss:
Hacker News
🕸️
Sparse Vectors
Reinforcement
Inference
: Leveraging Uncertainty for
Self-Correcting
Language Model Reasoning
arxiv.org
·
1d
🏗️
LLM Infrastructure
HQP
: Sensitivity-Aware Hybrid Quantization and
Pruning
for Ultra-Low-Latency Edge AI Inference
arxiv.org
·
2d
📱
Edge AI Optimization
World Models and the Data Problem in
Robotics
joeljang.github.io
·
1d
·
Discuss:
Hacker News
✨
Gemini
Show HN:
Fine-tuned
Qwen2.5-7B
on 100 films for probabilistic story graphs
cinegraphs.ai
·
2d
·
Discuss:
Hacker News
🏗️
LLM Infrastructure
Oatmeal
-
Constraint
propagation for fun
eli.li
·
3d
·
Discuss:
Lobsters
,
Hacker News
🧮
SMT Solvers
AI is
dominating
the world’s memory chips. That could make
phones
more expensive
restofworld.org
·
1d
·
Discuss:
Hacker News
🔗
Technology Supply Chains
[
RFC
PATCH v1 0/4] Machine Learning (
ML
) library in Linux kernel
lore.kernel.org
·
4d
·
Discuss:
Lobsters
,
Hacker News
🦙
Ollama
Blog: I
Benchmarked
Popular
MLX
Models That Fit on iPhone and iPad — Here's How Fast On-Device LLMs Actually Are
rickytakkar.com
·
2d
·
Discuss:
Hacker News
📊
Model Serving Economics
Designing and Using
Combinators
: The
Essence
of Functional Programming
cse.chalmers.se
·
1d
·
Discuss:
Hacker News
💻
Programming languages
Loading...
Loading more...
« Page 9
•
Page 11 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help