We built the fastest API for GLM-5.2 (280 TPS) (opens in new tab)
GLM-5.2 at 280+ tokens per second on NVIDIA Blackwell with KV-aware routing, PD disaggregation, Multi-Token Prediction, NVFP4, and more optimizations
Read the original articleGLM-5.2 at 280+ tokens per second on NVIDIA Blackwell with KV-aware routing, PD disaggregation, Multi-Token Prediction, NVFP4, and more optimizations
Read the original article