BOA Constrictor: Squeezing Performance out of GPUs in the Cloud via Budget-Optimal Allocation

arXiv:2602.01404v1 Announce Type: new Abstract: The past decade has seen a dramatic increase in demand for GPUs to train Machine Learning (ML) models. Because it is prohibitively expensive for most organizations to build and maintain a large GPU cluster, organizations instead choose to rent GPUs from cloud providers. The customer is responsible for devising a policy for (i) deciding how many GPUs to rent at every moment in time to process a stream of ML training jobs and (ii) allocating the rented GPUs among the currently active jobs in the system. Because ML training jobs can be parallelized across different numbers of GPUs, the customer generally has many options for how many GPUs to use for each job. Allocating more GPUs to a single training job will cause the job to complete more quick…

Similar Posts