📊 Model Serving Economics - emschwartz · Scour

BOute: Cost-Efficient LLM Serving with Heterogeneous LLMs and GPUs via Multi-Objective Bayesian Optimization

arxiv.org·2d

🏗️LLM Infrastructure

How low-bit inference enables efficient AI

dropbox.tech·3h·

Discuss: Hacker News

🧠LLM Inference

You are probably overpaying for intelligence

residuals.bearblog.dev·17h

🏆LLM Benchmarking

Leading Inference Providers Cut AI Costs by up to 10x With Open Source Models on NVIDIA Blackwell

blogs.nvidia.com·1d

📱Edge AI Optimization

GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction

medium.com·15h

Data Engineering for Large Models: Architecture, Algorithms & Projects

github.com·13h·

Discuss: Hacker News

🏗️LLM Infrastructure

Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy

venturebeat.com·1d·

Discuss: r/LocalLLaMA

🏗️LLM Infrastructure

AI Inference Needs A Mix-And-Match Memory Strategy

semiengineering.com·2d

🏗️LLM Infrastructure

Introducing Dedicated Container Inference: Delivering 2.6x faster inference for custom AI models

together.ai·2d

🏗️LLM Infrastructure

Scaling LLM Post-Training at Netflix

netflixtechblog.com·1d

🏗️LLM Infrastructure

Breaking the Tractability Barrier: A Generic Low-Level Solver for NP-Hard Instances (N=63) on Commodity 64-Bit Silicon

zenodo.org·1d·

Discuss: r/programming

🧮SMT Solvers

BalatroBench Benchmarks Large Language Models Playing Balatro

balatrobench.com·1d·

Discuss: Hacker News

🏗️LLM Infrastructure

Distinguish between inference scaling and "larger tasks use more compute"

lesswrong.com·2d

🧠LLM Inference

Supercharging Inference for AI Factories: KV Cache Offload as a Memory-Hierarchy Problem

blog.min.io·1d

🏗️LLM Infrastructure

DeepSeek-V3.2 on GB300: Performance Breakthrough

blog.vllm.ai·1d

🏗️LLM Infrastructure

London’s Nscale signs €1.1 billion debt facility to deploy large-scale GPU clusters in Europe

europedigital.cloud·1d

Benchmarking for Single Feature Attribution with Microarchitecture Cliffs

arxiv.org·1d

⚡Systems Performance

Two AI Economies, Two Outcomes

elmerdata.bearblog.dev·1d

Owning the AI Pareto Frontier

latent.space·1d

Nvidia’s Upstart Rivals See Cracks in AI Chip Market Leader’s Dominance

bloomberg.com

·1d

Loading more...