📊 Model Serving Economics - emschwartz · Scour

IBM Granite has earned a reputation for transparency

research.ibm.com·5h

🏗️LLM Infrastructure

Introducing GPT‑5.3‑Codex‑Spark

simonwillison.net·21h

NVIDIA DGX Spark Powers Big Projects in Higher Education

blogs.nvidia.com·1d

ChatGPT-5.3-Codex Is Also Good At Coding

lesswrong.com·2h

🏗️LLM Infrastructure

ShareChat hit a billion features per second, then it had to make it 10x cheaper

thenewstack.io·1d

🏗️Infrastructure Economics

Deterministic Inference with EigenAI

deterministicinference.com·2d

🧠LLM Inference

Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization

machinelearning.apple.com·3d

📦Batch Embeddings

The cure for the AI hype hangover

infoworld.com·10h

Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration

arxiv.org·14h

🧠LLM Inference

harishsg993010/tiny-NPU: opensource NPU for LLM inference (this run gpt2)

github.com·23h·

Discuss: r/LocalLLaMA

🏗️LLM Infrastructure

Simulate Faster with SimAI Software for High Returns at a Low Cost of Ownership

semiengineering.com·2d

Building an Embedding API with Rust, Arm, and EmbeddingGemma on AWS Lambda

sobolev.substack.com·8h·

Discuss: Substack

[AINews] Z.ai GLM-5: New SOTA Open Weights LLM

latent.space·1d

🏗️LLM Infrastructure

Pricing · Cloudflare Workers AI docs

developers.cloudflare.com·4d

Dario Amodei — The highest-stakes financial model in history

dwarkesh.com·2h·

Discuss: Hacker News

Code, Compute and Connection: Inside the Inaugural NVIDIA AI Day São Paulo

blogs.nvidia.com·21h

The Tast Supply Problem

charlielabs.ai·4h·

Discuss: Hacker News

Software at the speed of AI

infoworld.com·2d

Amortised and provably-robust simulation-based inference

arxiv.org·14h

🏗️LLM Infrastructure

Benchmark & Compare the Best AI Models

arena.ai·2d

🏆LLM Benchmarking

Sign up or log in to see more results