The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model (opens in new tab)
Transformer-based models underpin modern natural language processing but incur rapidly growing computational and energy costs. As training scales in both model size and parallelism, accurately predicting energy consumption has become critical for sustainable and cost-aware system design. We present a framework for modeling the energy consumption of Transformer training on multiple GPUs. Using controlled architectural sweeps of BERT models, we ...
Read the original article