
A new technical paper titled “Hardware Acceleration for Neural Networks: A Comprehensive Survey” was published by researchers at Arizona State University.
Abstract “Neural networks have become a dominant computational workload across cloud and edge platforms, but their rapid growth in model size and deployment diversity has exposed hardware bottlenecks that are increasingly dominated by memory movement, communication, and irregular operators rather than peak arithmetic throughput. This survey reviews the current technology landscape for hardware acceleration of deep learning, spanning GPUs and tensor-core architectures, domain-specific accelerators (e.g., T…

A new technical paper titled “Hardware Acceleration for Neural Networks: A Comprehensive Survey” was published by researchers at Arizona State University.
Abstract “Neural networks have become a dominant computational workload across cloud and edge platforms, but their rapid growth in model size and deployment diversity has exposed hardware bottlenecks that are increasingly dominated by memory movement, communication, and irregular operators rather than peak arithmetic throughput. This survey reviews the current technology landscape for hardware acceleration of deep learning, spanning GPUs and tensor-core architectures, domain-specific accelerators (e.g., TPUs/NPUs), FPGA-based designs, ASIC inference engines, and emerging LLM-serving accelerators such as LPUs (language processing units), alongside in-/near-memory computing and neuromorphic/analog approaches. We organize the space using a unified taxonomy across (i) workloads (CNNs, RNNs, GNNs, Transformers/LLMs), (ii) execution settings (training vs. inference; datacenter vs. edge), and (iii) optimization levers (reduced precision, sparsity and pruning, operator fusion, compilation and scheduling, and memory-system/interconnect design). We synthesize key architectural ideas such as systolic arrays, vector and SIMD engines, specialized attention and softmax kernels, quantization-aware datapaths, and high-bandwidth memory, and we discuss how software stacks and compilers bridge model semantics to hardware. Finally, we highlight open challenges—including efficient long-context LLM inference (KV-cache management), robust support for dynamic and sparse workloads, energy- and security-aware deployment, and fair benchmarking—pointing to promising directions for the next generation of neural acceleration.”
Find the technical paper here. Published December 2025.
arXiv:2512.23914 Xu, Bin, Ayan Banerjee and Sandeep Gupta. “Hardware Acceleration for Neural Networks: A Comprehensive Survey.” (2025).