Ask HN: What are some good/fast coding models for Apple Silicon? (opens in new tab)
I have an M4 Max with 128 GB of unified memory, and I thought it would be easy to reach decent inference speeds with it. After a few failed attempts to exceed about 150 t/s with completely custom Metal inference engines tailor-built by Claude, I'm stumped.
Read the original article