Design of a Feasible Wireless MAC Communication Protocol via Multi-Agent Reinforcement Learning
Miuccio, Luciano; Riolo, Salvatore; Bennis, Mehdi; Panno, Daniela (2024-08-15)
Miuccio, Luciano
Riolo, Salvatore
Bennis, Mehdi
Panno, Daniela
IEEE
15.08.2024
L. Miuccio, S. Riolo, M. Bennis and D. Panno, "Design of a Feasible Wireless MAC Communication Protocol via Multi-Agent Reinforcement Learning," 2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN), Stockholm, Sweden, 2024, pp. 94-100, doi: 10.1109/ICMLCN59089.2024.10624759
https://rightsstatements.org/vocab/InC/1.0/
© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists,or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists,or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202502061485
https://urn.fi/URN:NBN:fi:oulu-202502061485
Tiivistelmä
Abstract
In the future beyond 5G (B5G) and 6G wireless networks, the topic of automatically learning a medium access control (MAC) communication protocol via the multi-agent reinforcement learning (MARL) paradigm has been receiving much attention. The proposals available in the literature show promising simulation results. However, they have been designed to run in computer simulations, where an environment gives observations and rewards to the agents neglecting the communications overhead. As a result, these solutions cannot be implemented in real-world scenarios as they are or require huge additional costs. In this paper, we focus on this feasibility problem. First, we provide a new description of the main learning schemes available in the literature from the perspective of feasibility in practical scenarios. Then, we propose a new feasible MARL-based learning framework that goes beyond the concept of an omniscient environment. We properly model a feasible Markov decision process (MDP), identify which physical entity calculates the reward, and how the reward is provided to the learning agents. The proposed learning framework is designed to reduce impact on the communication resources, while better exploiting the available information to learn efficient MAC protocols. Finally, we compare the proposed feasible framework against other solutions in terms of training convergence and communication performance achieved by the learned MAC protocols. The simulation results show that our feasible system exhibits performance in line with the unfeasible solutions.
In the future beyond 5G (B5G) and 6G wireless networks, the topic of automatically learning a medium access control (MAC) communication protocol via the multi-agent reinforcement learning (MARL) paradigm has been receiving much attention. The proposals available in the literature show promising simulation results. However, they have been designed to run in computer simulations, where an environment gives observations and rewards to the agents neglecting the communications overhead. As a result, these solutions cannot be implemented in real-world scenarios as they are or require huge additional costs. In this paper, we focus on this feasibility problem. First, we provide a new description of the main learning schemes available in the literature from the perspective of feasibility in practical scenarios. Then, we propose a new feasible MARL-based learning framework that goes beyond the concept of an omniscient environment. We properly model a feasible Markov decision process (MDP), identify which physical entity calculates the reward, and how the reward is provided to the learning agents. The proposed learning framework is designed to reduce impact on the communication resources, while better exploiting the available information to learn efficient MAC protocols. Finally, we compare the proposed feasible framework against other solutions in terms of training convergence and communication performance achieved by the learned MAC protocols. The simulation results show that our feasible system exhibits performance in line with the unfeasible solutions.
Kokoelmat
- Avoin saatavuus [38840]