🛠 Ml-eng - miterion · Scour

Architecturally Significant MLOps Guidelines for ML Model Integration and Deployment: a Gray Literature Review

🚀MLOps Academic

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

📊Profiling Tools

zozo123.github.io··Hacker News

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🔥PyTorch Code

github.com··Hacker News

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

📈Occupancy Optimization News

newsletter.semianalysis.com

··Hacker News

Predicting the World Cup Winner: Live Coding with Hopswor...

hopsworks.ai··Hacker News

Running LLM Inference on Kubernetes: What It Actually Takes

🔱Triton Blog

fairwinds.com·

Real-time fraud detection for financial transactions

⚡Flash Attention Blog

Infrastructure Options for Scalable AI Inference

⚙️Systems Programming Blog

15 years of Software Center – A Look in the Mirror and over the Front Windshield

🚀MLOps Blog

metrics.blogg.gu.se·

Token4Token — pay-per-token inference on Gnosis + Swarm

⚡ONNX Runtime

t4t.eth.link··Hacker News

AI Serving Platform That Adapts to Your Model

⏱️Benchmarking Blog

databricks.com·

Build a Medical Report Analyzer on Dedicated Inference with Python

digitalocean.com·

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

huggingface.co··r/LocalLLaMA

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

🎮NVIDIA Blog

blogs.nvidia.com·

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

🔥PyTorch Blog

dnhkng.github.io·

Your AI Factory Won't Scale to Inference: Here's Why | Ari Weil, Akamai

🚀MLOps Video

DiffusionGemma: 4x Faster Text Generation

🎮NVIDIA News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

🎮NVIDIA Code

github.com··Hacker News

DiffusionGemma: The Developer Guide- Google Developers Blog

🎯Tensor Cores Blog

developers.googleblog.com··r/LocalLLaMA

When your data model is the bottleneck: lessons from Medium’s feature store

⚡ONNX Runtime

thenewstack.io·

Log in to enable infinite scrolling