Learning-Based Channel Access in Wi-Fi: A Multi-Armed Bandit Approach

View PDF HTML (experimental)

Abstract:Due to its static protocol design, IEEE 802.11 (aka Wi-Fi) channel access lacks adaptability to address dynamic network conditions, resulting in inefficient spectrum utilization, unnecessary contention, and packet collisions. This paper investigates reinforcement learning (RL) solutions to optimize Wi-Fi’s medium access control (MAC). In particular, a multi-armed bandit (MAB) framework is proposed for dynamic channel access (including both the primary channel and channel width) and contention window (CW) adjustment. In this setting, we study relevant learning design principles such as adopting joint or factorial action spaces (handled by a single agent (SA…

View PDF HTML (experimental)

Abstract:Due to its static protocol design, IEEE 802.11 (aka Wi-Fi) channel access lacks adaptability to address dynamic network conditions, resulting in inefficient spectrum utilization, unnecessary contention, and packet collisions. This paper investigates reinforcement learning (RL) solutions to optimize Wi-Fi’s medium access control (MAC). In particular, a multi-armed bandit (MAB) framework is proposed for dynamic channel access (including both the primary channel and channel width) and contention window (CW) adjustment. In this setting, we study relevant learning design principles such as adopting joint or factorial action spaces (handled by a single agent (SA) and multiple agents (MA), respectively) and the importance of incorporating contextual information. Our simulation results show that cooperative MA architectures converge faster than their SA counterparts, as agents operate over smaller action spaces. Another key insight is that contextual MAB algorithms consistently outperform non-contextual ones, highlighting the value of leveraging side information in action selection. Moreover, in multi-player settings, results demonstrate that decentralized learners can achieve implicit coordination, although their greediness may degrade coexisting networks’ performance and induce policy-chasing dynamics. Overall, these findings demonstrate that (contextual) MAB-based learning offers a practical and adaptive alternative to static IEEE 802.11 protocols, enabling more efficient and intelligent spectrum utilization.


Comments:	preprint
Subjects:	Networking and Internet Architecture (cs.NI)
Cite as:	arXiv:2511.10143 [cs.NI]
	(or arXiv:2511.10143v1 [cs.NI] for this version)
	https://doi.org/10.48550/arXiv.2511.10143 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Miguel Casasnovas Bielsa [view email] [v1] Thu, 13 Nov 2025 09:57:13 UTC (2,172 KB)

Submission history

Similar Posts