🚀 Model Serving - micaleel · Scour

Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization 🧮Vector Databases

Reiner Pope – The math behind how LLMs are trained and served 🧠Deep Learning

dwarkesh.com·1d

Speculative Decoding vs MoE: 3.2x Cost Gap on Llama 3 🔨LLVM

tildalice.io·3d

AmSach/kvquant: Drop-in KV cache compressor for local LLM inference - Run 70B models on 8GB RAM 🔨LLVM

github.com·6h·DEV

From $200 to $30: Five Layers of LLM Cost Optimization 🛠️Feature Engineering

blog.dwornikowski.com·6d·Hacker News

Prefetching Weights in llama.cpp 🔨LLVM

am17an.bearblog.dev·2d

a16z: Large Model Deployment = Forgetting—Can “Continual Learning” Break This Vicious Cycle? 🛠️Feature Engineering

techflowpost.com·6d

Paper page - Large Language Models Explore by Latent Distilling 🤖Transformers

huggingface.co·3h

How to Deploy a Serverless Spam Classifier Using Scikit-Learn, AWS Lambda, & API Gateway 🤖Machine Learning

freecodecamp.org·13h

AutoSP: Long-Context LLM Training via Compiler-Based Sequence Parallelism 🧠Deep Learning

pytorch.org·23h·Hacker News

Show HN: I built a 2nd-order PyTorch optimizer for LLMs that runs on 16GB GPUs 🔨LLVM

news.ycombinator.com·1d·Hacker News

From local prototyping to GPUs in the GCP cloud: Creating a satellite image classification system… 🧠Deep Learning

·15h

Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity 🛠️Feature Engineering

arxiv.org·1d·Hacker News

My local agentic dev setup today 🔨LLVM

willemvandenende.com·3h·Hacker News

Asynchronously Filling & Evicting Caches ⏱️Async Programming

·16h

DFIR + AI: Using Local LLMs with DFIR MCP Servers 🤖AI

cybertriage.com·19m

Dedicated vs Serverless Inference as You Scale 🔄Concurrency

digitalocean.com·1d

PyTorch Lightning project quarantined by PyPI 📦uv

pypi.org·3h·Hacker News

Machine Learning Developers: Why Most ML Projects Fail After the Model Stage 🤖Machine Learning

artificialintelligence.oodles.io·6h·DEV

The Inference Economy: Token Use 🛠️Feature Engineering

frontierai.substack.com·18m·Substack

Log in to enable infinite scrolling