RT by @awnihannun: oMLX deserves so much more attention than it gets. (opens in new tab)
oMLX deserves so much more attention than it gets. (My own NVFP4 implementation) 18k context, on a MacPro M4: prefill 11,565 tok/s · gen 51 tok/s · ttft 0.57s NVDA format, running natively on Apple Silicon. Beautiful framework to work with and build on.
Read the original article