The three types of LLM workloads and how to serve them
modal.com·7h·
Discuss: Hacker News
💻Local LLMs
Preview
Report Post

We hold this truth to be self-evident: not all workloads are created equal.

But for large language models, this truth is far from universally acknowledged. Most organizations building LLM applications get their AI from an API, and these APIs hide the varied costs and engineering trade-offs of distinct workloads behind deceptively flat per-token pricing.

The truth, however, will out. The era of model API dominance is ending, thanks to excellent work on open source models by DeepSeek and Alibaba Qwen (eroding the benefits of proprietary model APIs like OpenAI’s) and excellent work on open source inference engines like vLLM and SGLang (eroding the benefits of open model APIs powered by proprietary inference engines).

E…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help