Pipeline-parallel LLM inference across GPUs on separate machines (opens in new tab)
Pipeline-parallel LLM inference across GPUs on separate machines. - leyten/shard
Read the original articlePipeline-parallel LLM inference across GPUs on separate machines. - leyten/shard
Read the original article