Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Engineering
🤖 AI Engineering
AI infrastructure, model serving, inference, MLOps
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
219
posts in
5.5
ms
Architecturally Significant
MLOps
Guidelines for ML
Model
Integration and
Deployment
: a Gray Literature Review
🔩
ML Compilers
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for Architecturally Significant MLOps Guidelines for ML Model Integration and Deployment: a Gray Literature Review
Intelligent
inference
scheduling with
llm-d
on Red Hat
AI
🔧
Backend Dev
developers.redhat.com
·
2d
2 days ago
Actions for Intelligent inference scheduling with llm-d on Red Hat AI
microsoft/LLMLingua: [EMNLP'23, ACL'24] To speed up LLMs'
inference
and enhance
LLM
's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
🧠
LLM Research
Content type:
Code
github.com
·
10h
10 hours ago
·
DEV
Actions for microsoft/LLMLingua: [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
How ERGO Hestia reduced time-to-market with Lakebase and Mosaic
AI
Model
Serving
🌐
Distributed Systems
Content type:
Blog
databricks.com
·
1d
1 day ago
Actions for How ERGO Hestia reduced time-to-market with Lakebase and Mosaic AI Model Serving
Lowest-Cost
LLM
Inference
: The Complete OpenRouter Guide
🔧
Backend Dev
Content type:
Blog
Content type:
Discussion
Content type:
Tutorial
openrouter.ai
·
22h
22 hours ago
Actions for Lowest-Cost LLM Inference: The Complete OpenRouter Guide
Infrastructure
Options for Scalable
AI
Inference
🧠
LLM Research
Content type:
Blog
mirantis.com
·
3d
3 days ago
Actions for Infrastructure Options for Scalable AI Inference
Training the
Model
Was Only 20% of the Job: Lessons from Building an
MLOps
Platform
🧠
LLM Research
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for Training the Model Was Only 20% of the Job: Lessons from Building an MLOps Platform
detects when
ML
research consensus is shifting using Bayesian CUSUM
🧠
LLM Research
tattvaai.org
·
22h
22 hours ago
·
Hacker News
Actions for detects when ML research consensus is shifting using Bayesian CUSUM
All sorts of famous Attention Layers
🖥️
OS Development
Content type:
Blog
harsh-ps-2003.bearblog.dev
·
34m
34 minutes ago
Actions for All sorts of famous Attention Layers
A Complete Beginner's Guide to Local
LLM
Inference
🧠
LLM Research
Content type:
Blog
khnsakhnm.medium.com
·
2d
2 days ago
Actions for A Complete Beginner's Guide to Local LLM Inference
Inferoa
AI
harness claimed 90% cache savings. We ran it and measured 97.8%
🔩
ML Compilers
zozo123.github.io
·
3d
3 days ago
·
Hacker News
Actions for Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%
The
Quantum
Leap in
LLM
Inference
: How Modern Architectures Predict Tokens at Warp Speed Without…
🧠
LLM Research
Content type:
Blog
medium.com
·
21h
21 hours ago
Actions for The Quantum Leap in LLM Inference: How Modern Architectures Predict Tokens at Warp Speed Without…
The Beginner
MLOps
Guide I Wish I Had — Versioning,
Deployment
, Monitoring, and Drift
🔩
ML Compilers
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for The Beginner MLOps Guide I Wish I Had — Versioning, Deployment, Monitoring, and Drift
lightmetal:
GPU
LLM
Inference
From a Single Java 25 JAR
🎮
GPU Programming
Content type:
Blog
adambien.blog
·
4d
4 days ago
Actions for lightmetal: GPU LLM Inference From a Single Java 25 JAR
Metrics that Matter with
Serverless
Inference
🌐
Network Protocols
digitalocean.com
·
1d
1 day ago
Actions for Metrics that Matter with Serverless Inference
DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
🎮
GPU Programming
Content type:
News
newsletter.semianalysis.com
·
4d
4 days ago
·
Hacker News
·
Cited by 1 article
Actions for DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
12B Gemma 4 QAT
Deployment
with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI
🔮
Multimodal AI
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for 12B Gemma 4 QAT Deployment with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI
Unsloth Minimax M3 GGUF
🦀
Rust
huggingface.co
·
23h
23 hours ago
·
r/LocalLLaMA
Actions for Unsloth Minimax M3 GGUF
New comment by okl1m3k in "Ask HN: Who wants to be hired? (June 2026)"
🔮
Multimodal AI
Content type:
Reference
docs.google.com
·
21h
21 hours ago
·
Hacker News
Actions for New comment by okl1m3k in "Ask HN: Who wants to be hired? (June 2026)"
A system programmer’s guide to
LLM
inference
🎮
GPU Programming
Content type:
Blog
blog.xiangpeng.systems
·
5d
5 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help