Getting a feel for how fast X tokens/second really is. (opens in new tab)

Covered by 5 sources including ruanyifeng.com, Simon Willison's NewsletterDiscussed on Hacker News and r/LocalLLaMA

Every local-LLM benchmark reports throughput: "47 tok/s on an M3," "180 tok/s on a 4090," "500 tok/s on Groq." Unless you've actually watched tokens stream at those rates, the numbers are hard to internalize. This is the rendering.

Read the original article

Sign in to keep reading the full article.

Sign Up Log In

Covered in 5 articles

Simon Willison's Newsletter·

Datasette Agent: an AI assistant for Datasette built on LLM

Discussed on Substack

Alex Hyett·

What skills are still relevant?

simonwillison.net·

How fast is 10 tokens per second really?

View all 5 ›