Quantization, Attention Mechanisms, Batch Processing, KV Caching
Detailed Study of Performance Modeling For LLM Implementations At Scale (imec)
semiengineering.com·3h
Economics of Claude 3 Inference
lesswrong.com·5h
A Conversation with Val Bercovici about Disaggregated Prefill / Decode
fabricatedknowledge.com·3h
Using a Framework Desktop for local AI
frame.work·4h
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
machinelearning.apple.com·23h
AI cloud infrastructure gets faster and greener: NPU core improves inference performance by over 60%
techxplore.com·2h
Looking for the Number 4 Inside the Black Box
determinedquokka.bearblog.dev·19h
Understanding PPO for LLMs
stpn.bearblog.dev·19h
Approach improves how new skills are taught to large language models
techxplore.com·10h
Don’t Build Chatbots — Build Agents With Jobs
thenewstack.io·8h
Using a Framework Desktop for local AI
frame.work·4h
Loading...Loading more...