I've updated my glorified Llama fork (LLM Inference Server) for P40's to utilise MTP + TurboQuant + DFlash (opens in new tab)
Unified GPU inference server: Qwen 3.5 + Whisper + TimesFM 2.5 on Tesla P40 - Sakatard/llm-inference-server
Read the original article