arXiv:2601.12662v1 Announce Type: new Abstract: We address real-time sampling and estimation of autoregressive Markovian sources in dynamic yet structurally similar multi-hop wireless networks. Each node caches samples from others and communicates over wireless collision channels, aiming to minimize time-average estimation error via decentralized policies. Due to the high dimensionality of action spaces and complexity of network topologies, deriving optimal policies analytically is intractable. To address this, we propose a graphical multi-agent reinforcement learning framework for policy optimization. Theoretically, we demonstrate that our proposed policies are transferable, allowing a policy trained on one graph to be effectively applied to structurally similar graphs. Numerical experime…
arXiv:2601.12662v1 Announce Type: new Abstract: We address real-time sampling and estimation of autoregressive Markovian sources in dynamic yet structurally similar multi-hop wireless networks. Each node caches samples from others and communicates over wireless collision channels, aiming to minimize time-average estimation error via decentralized policies. Due to the high dimensionality of action spaces and complexity of network topologies, deriving optimal policies analytically is intractable. To address this, we propose a graphical multi-agent reinforcement learning framework for policy optimization. Theoretically, we demonstrate that our proposed policies are transferable, allowing a policy trained on one graph to be effectively applied to structurally similar graphs. Numerical experiments demonstrate that (i) our proposed policy outperforms state-of-the-art baselines; (ii) the trained policies are transferable to larger networks, with performance gains increasing with the number of agents; (iii) the graphical training procedure withstands non-stationarity, even when using independent learning techniques; and (iv) recurrence is pivotal in both independent learning and centralized training and decentralized execution, and improves the resilience to non-stationarity.