DWM-RO: Decentralized World Models with Reasoning Offloading for SWIPT-enabled Satellite-Terrestrial HetNets

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2511.05972 (cs)

Abstract:Wireless networks are undergoing a paradigm shift toward massive connectivity with energy-efficient operation, driving the integration of satellite-terrestrial architectures with simultaneous wireless information and power transfer (SWIPT). Optimizing transmit beamforming and power splitting in such systems faces formidable challenges, e.g., time-varying channels and multi-tier interference, which create a complex decision landscape where conventional model-free multi-agent reinforcement learning (MARL) suffers from sample inefficiency due to rarely-encountered state tr…

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2511.05972 (cs)

View PDF HTML (experimental)

Abstract:Wireless networks are undergoing a paradigm shift toward massive connectivity with energy-efficient operation, driving the integration of satellite-terrestrial architectures with simultaneous wireless information and power transfer (SWIPT). Optimizing transmit beamforming and power splitting in such systems faces formidable challenges, e.g., time-varying channels and multi-tier interference, which create a complex decision landscape where conventional model-free multi-agent reinforcement learning (MARL) suffers from sample inefficiency due to rarely-encountered state transitions and poor coordination as decentralized agents act independently. This paper proposes the Decentralized World Model with Reasoning Offloading (DWM-RO) framework to address these fundamental limitations. Specifically, each agent employs a world model to learn compact predictive representations of environment dynamics, enabling imagination-based policy training that dramatically reduces required environment interactions. An uncertainty-aware offloading gate monitors local interference levels and model reconstruction errors to trigger selective edge coordination. When activated, a lightweight latent decorrelation mechanism at the edge refines agents’ strategic representations, guiding them toward orthogonal actions that minimize resource conflicts. Extensive simulations demonstrate that DWM-RO converges 5 times faster than state-of-the-art baselines while achieving 34.7% higher spectral efficiency and reducing constraint violations by 40%. In dense network scenarios with 10 users, DWM-RO maintains violation rates below 20% while baselines exceed 70%, validating superior robustness.


Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2511.05972 [cs.DC]
	(or arXiv:2511.05972v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2511.05972 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Guangyuan Liu [view email] [v1] Sat, 8 Nov 2025 11:28:58 UTC (1,525 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Computer Science > Distributed, Parallel, and Cluster Computing

Submission history

Similar Posts