AI Engineering

Feeds to Scour
SubscribedAll
Scoured 219 posts in 5.5 ms

Architecturally Significant MLOps Guidelines for ML Model Integration and Deployment: a Gray Literature Review

 🔩ML Compilers  Content type: Academic
arxiv.org·

Intelligent inference scheduling with llm-d on Red Hat AI

 🔧Backend Dev
developers.redhat.com·

microsoft/LLMLingua: [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

 🧠LLM Research  Content type: Code
github.com··DEV

How ERGO Hestia reduced time-to-market with Lakebase and Mosaic AI Model Serving

 🌐Distributed Systems  Content type: Blog
databricks.com·

Lowest-Cost LLM Inference: The Complete OpenRouter Guide

 🔧Backend Dev  Content type: Blog  Content type: Discussion  Content type: Tutorial
openrouter.ai·

Infrastructure Options for Scalable AI Inference

 🧠LLM Research  Content type: Blog
mirantis.com·

Training the Model Was Only 20% of the Job: Lessons from Building an MLOps Platform

 🧠LLM Research  Content type: Blog
medium.com
·

detects when ML research consensus is shifting using Bayesian CUSUM

 🧠LLM Research
tattvaai.org··Hacker News

All sorts of famous Attention Layers

 🖥️OS Development  Content type: Blog
harsh-ps-2003.bearblog.dev·

A Complete Beginner's Guide to Local LLM Inference

 🧠LLM Research  Content type: Blog
khnsakhnm.medium.com·

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

 🔩ML Compilers

The Quantum Leap in LLM Inference: How Modern Architectures Predict Tokens at Warp Speed Without…

 🧠LLM Research  Content type: Blog
medium.com
·

The Beginner MLOps Guide I Wish I Had — Versioning, Deployment, Monitoring, and Drift

 🔩ML Compilers  Content type: Blog
medium.com
·

lightmetal: GPU LLM Inference From a Single Java 25 JAR

 🎮GPU Programming  Content type: Blog
adambien.blog·

Metrics that Matter with Serverless Inference

 🌐Network Protocols
digitalocean.com·

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

 🎮GPU Programming  Content type: News

12B Gemma 4 QAT Deployment with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI

 🔮Multimodal AI  Content type: Blog
medium.com
·

Unsloth Minimax M3 GGUF

 🦀Rust
huggingface.co··r/LocalLLaMA

New comment by okl1m3k in "Ask HN: Who wants to be hired? (June 2026)"

 🔮Multimodal AI  Content type: Reference
docs.google.com··Hacker News

A system programmer’s guide to LLM inference

 🎮GPU Programming  Content type: Blog

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help