I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! (opens in new tab)
ik_llama.cpp fork with experimental NUMA per-node mirroring of model weights and KV cache. This is an attempt to maximize CPU inference performance on multi-socket systems. - mikechambers84/ik_llam...
Read the original article