Luce KVFlash: 256K context with 72MiB of KV cache on the GPU (opens in new tab)
Fast LLM speculative inference server for consumer hardware. - Luce-Org/lucebox-hub
Read the original articleFast LLM speculative inference server for consumer hardware. - Luce-Org/lucebox-hub
Read the original article