tensor parallelism, pipeline parallelism, distributed training, sharding
Press ? anytime to show this help