Computer Science > Machine Learning
arXiv:2512.06582 (cs)
Abstract:Recurrent neural architectures such as LSTM and GRU remain widely used in sequence modeling, but they continue to face two core limitations: redundant gate-specific parameters and reduced ability to retain information across long temporal distances. This paper introduces the Quantum-Leap LSTM (QL-LSTM), a recurrent architecture designed to address both challenges through two independent components. The Parameter-Shared Unified Gating mechanism replaces all gate-specific transformations with a single shared weight matrix, reducing parameters by approximately 48 percent while preserving full gating behavior. The Hierarchical Gated Recurrence with Additive Skip …
Computer Science > Machine Learning
arXiv:2512.06582 (cs)
Abstract:Recurrent neural architectures such as LSTM and GRU remain widely used in sequence modeling, but they continue to face two core limitations: redundant gate-specific parameters and reduced ability to retain information across long temporal distances. This paper introduces the Quantum-Leap LSTM (QL-LSTM), a recurrent architecture designed to address both challenges through two independent components. The Parameter-Shared Unified Gating mechanism replaces all gate-specific transformations with a single shared weight matrix, reducing parameters by approximately 48 percent while preserving full gating behavior. The Hierarchical Gated Recurrence with Additive Skip Connections component adds a multiplication-free pathway that improves long-range information flow and reduces forget-gate degradation. We evaluate QL-LSTM on sentiment classification using the IMDB dataset with extended document lengths, comparing it to LSTM, GRU, and BiLSTM reference models. QL-LSTM achieves competitive accuracy while using substantially fewer parameters. Although the PSUG and HGR-ASC components are more efficient per time step, the current prototype remains limited by the inherent sequential nature of recurrent models and therefore does not yet yield wall-clock speed improvements without further kernel-level optimization.
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE) |
| Cite as: | arXiv:2512.06582 [cs.LG] |
| (or arXiv:2512.06582v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2512.06582 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Isaac Kofi Nti Dr [view email] [v1] Sat, 6 Dec 2025 22:29:19 UTC (980 KB)
Current browse context:
cs.LG
Change to browse by:
export BibTeX citation