Luce KVFlash: 256K context with 72MiB of KV cache on the GPU (opens in new tab)

Covered by news.smol.aiDiscussed on Hacker News

Fast LLM speculative inference server for consumer hardware. - Luce-Org/lucebox-hub

Read the original article

Sign in to keep reading the full article.

Covered in 2 articles

GLM 5.2: the top Frontend Coding model in the world, IndexShare reduces costs

not much happened today | AINews