QL-LSTM: A Parameter-Efficient LSTM for Stable Long-Sequence Modeling

Computer Science > Machine Learning

arXiv:2512.06582 (cs)

Abstract:Recurrent neural architectures such as LSTM and GRU remain widely used in sequence modeling, but they continue to face two core limitations: redundant gate-specific parameters and reduced ability to retain information across long temporal distances. This paper introduces the Quantum-Leap LSTM (QL-LSTM), a recurrent architecture designed to address both challenges through two independent components. The Parameter-Shared Unified Gating mechanism replaces all gate-specific transformations with a single shared weight matrix, reducing parameters by approximately 48 percent while preserving full gating behavior. The Hierarchical Gated Recurrence with Additive Skip …

Computer Science > Machine Learning

arXiv:2512.06582 (cs)

View PDF

Abstract:Recurrent neural architectures such as LSTM and GRU remain widely used in sequence modeling, but they continue to face two core limitations: redundant gate-specific parameters and reduced ability to retain information across long temporal distances. This paper introduces the Quantum-Leap LSTM (QL-LSTM), a recurrent architecture designed to address both challenges through two independent components. The Parameter-Shared Unified Gating mechanism replaces all gate-specific transformations with a single shared weight matrix, reducing parameters by approximately 48 percent while preserving full gating behavior. The Hierarchical Gated Recurrence with Additive Skip Connections component adds a multiplication-free pathway that improves long-range information flow and reduces forget-gate degradation. We evaluate QL-LSTM on sentiment classification using the IMDB dataset with extended document lengths, comparing it to LSTM, GRU, and BiLSTM reference models. QL-LSTM achieves competitive accuracy while using substantially fewer parameters. Although the PSUG and HGR-ASC components are more efficient per time step, the current prototype remains limited by the inherent sequential nature of recurrent models and therefore does not yet yield wall-clock speed improvements without further kernel-level optimization.


Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2512.06582 [cs.LG]
	(or arXiv:2512.06582v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2512.06582 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Isaac Kofi Nti Dr [view email] [v1] Sat, 6 Dec 2025 22:29:19 UTC (980 KB)

Computer Science > Machine Learning

Computer Science > Machine Learning

Submission history

Bookmark

Similar Posts