Running AI on mixed hardware for speed and affordability (opens in new tab)
Researchers show that serving AI models with llm-d can boost inference speeds by up to 5 times and double throughput — all while using heterogeneous GPUs.
Read the original article