20288 (opens in new tab)

#Python qwen3_5_moe: add CUDA Engine/Session execution path Add a Qwen3.5-MoE execution adapter on top of the new LLMEngine/LLMSession and CUDA mutable-state foundations. The engine loads one physical model, registers exported mutable-buffer metadata, and creates isolated sessions that rebind their own KV/conv/recurrent state before execution while sharing model weights. Keep the existing CLI behavior by making main.cpp a thin wrapper over the engine/session path. The export now records the m...

Read the original article