⚙️ Inference - test · Scour

Overcoming inference challenges 🔀LoRA

redhat.com·3d

Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale 🚀MLOps

arxiv.org·1d

The Engine Behind Modern LLM Inference, Part 1: Continuous Batching, PagedAttention, and the End of… 💬LLMs

medium.com·15h

F&S M.2 AI Accelerator Uses NXP Ara-240 for Edge Inference Workloads 📊AI Evals

linuxgizmos.com·4h

Inside LLM Inference: KV Cache, Prefill, and the Decode Bottleneck 💬LLMs

pub.towardsai.net

·1d

milanm/AutoGrad-Engine: A complete GPT language model (training and inference) in ~600 lines of pure C#, zero dependencies 💬LLMs

github.com·17h·Hacker News

LLM inference engine from scratch in C++ 💬LLMs

anirudhsathiya.com·4d·Hacker News

I Ran My KYB Engine at Three Quantization Levels. Accuracy Didn't Move. Cost Dropped 6x. 📊AI Evals

walsenburgtech.com·15h·Hacker News

We Put a Gaming Box in the Inference Loop 📊AI Evals

write.as·2d

Prediction: The "Inference Supercycle" Could Be Bigger Than the Training Boom. 1 Growth Stock to Own. 🔀LoRA

finance.yahoo.com·15h

RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference 💬LLMs

vldb.org·1d

Inference Arena – new benchmark of local inference and training 🚀MLOps

kvark.github.io·4d·Hacker News

New course: Efficient Inference with SGLang: Text and Image Generation, built in partnership with LMSys @lmsysorg and RadixArk @radixark, and taught by Richard ... 💬LLMs

twitter.macworks.dev

·15h

Building the Blueprint for Premium Inference 📊AI Evals

sambanova.ai·1d

How to achieve P90 sub-microsecond latency in a C++ FIX engine 🎯Fine-Tuning

akinocal1.substack.com·11h·Substack

The case for Model-as-a-Service over self-managed inference 🚀MLOps

news.ycombinator.com·3d·Hacker News

Attn-QAT: Making 4-Bit Attention Actually Work 🎯Fine-Tuning

haoailab.com·1d

Meta’s Muse Spark: a smaller, faster AI model for broad app deployment 📊AI Evals

infoworld.com·16h

UCCL-EP: Portable Expert-Parallel Communication 🚀MLOps

uccl-project.github.io·2d·Hacker News

TurboQuant Is Quietly Solving LLM Inference’s Worst Memory Problem 💬LLMs

medium.com·5d

Loading more...