RT by @awnihannun: A long time coming but new mlx-lm is here with better batching support in the server and Gemma 4. (opens in new tab)
A long time coming but new mlx-lm is here with better batching support in the server and Gemma 4. pip install -U mlx-lm Here is a video where a single M3 Ultra serves 5 opencode sessions with Gemma 4 26B that process ~130k tokens in ~1.5 minutes. Video
Read the original article