arXiv:2601.22862v1 Announce Type: new Abstract: Heterogeneous architectures can deliver higher performance and energy efficiency than symmetric counterparts by using multiple architectures tuned to different types of workloads. While previous works focused on CPUs, this work extends the concept of heterogeneity to GPUs by proposing KHEPRI, a heterogeneous GPU architecture for graphics applications. Scenes in graphics applications showcase diversity, as they consist of many objects with varying levels of complexity. As a result, computational intensity and memory bandwidth requirements differ significantly across different regions of each scene. To address this variability, our proposal includes two types of cores: cores optimized for high ILP (compute-specialized) and cores that tolerate a…
arXiv:2601.22862v1 Announce Type: new Abstract: Heterogeneous architectures can deliver higher performance and energy efficiency than symmetric counterparts by using multiple architectures tuned to different types of workloads. While previous works focused on CPUs, this work extends the concept of heterogeneity to GPUs by proposing KHEPRI, a heterogeneous GPU architecture for graphics applications. Scenes in graphics applications showcase diversity, as they consist of many objects with varying levels of complexity. As a result, computational intensity and memory bandwidth requirements differ significantly across different regions of each scene. To address this variability, our proposal includes two types of cores: cores optimized for high ILP (compute-specialized) and cores that tolerate a higher number of simultaneously outstanding cache misses (memory-specialized). A key component of the proposed architecture is a novel work scheduler that dynamically assigns each part of a frame (i.e., a tile) to the most suitable core. Designing this scheduler is particularly challenging, as it must preserve data locality; otherwise, the benefits of heterogeneity may be offset by the penalty of additional cache misses. Additionally, the scheduler requires knowledge of each tile’s characteristics before rendering it. For this purpose, KHEPRI leverages frame-to-frame coherence to predict the behavior of each tile based on that of the corresponding tile in the previous frame. Evaluations across a wide range of commercial animated graphics applications show that, compared to a traditional homogeneous GPU, KHEPRI achieves an average performance improvement of 9.2%, a throughput increase (frames per second) of 7.3%, and a total GPU energy reduction of 4.8%. Importantly, these benefits are achieved without any hardware overhead.