Conservative and Risk-Aware Offline Multi-Agent Reinforcement Learning
Eldeeb, Eslam; Sifaou, Houssem; Simeone, Osvaldo; Shehab, Mohammad; Alves, Hirley (2024-11-14)
Eldeeb, Eslam
Sifaou, Houssem
Simeone, Osvaldo
Shehab, Mohammad
Alves, Hirley
IEEE
14.11.2024
E. Eldeeb, H. Sifaou, O. Simeone, M. Shehab and H. Alves, "Conservative and Risk-Aware Offline Multi-Agent Reinforcement Learning," in IEEE Transactions on Cognitive Communications and Networking, vol. 11, no. 3, pp. 1913-1926, June 2025, doi: 10.1109/TCCN.2024.3499357
https://creativecommons.org/licenses/by/4.0/
© 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
https://creativecommons.org/licenses/by/4.0/
© 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
https://creativecommons.org/licenses/by/4.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202411226858
https://urn.fi/URN:NBN:fi:oulu-202411226858
Tiivistelmä
Abstract
Reinforcement learning (RL) has been widely adopted for controlling and optimizing complex engineering systems such as next-generation wireless networks. An important challenge in adopting RL is the need for direct access to the physical environment. This limitation is particularly severe in multi-agent systems, for which conventional multi-agent reinforcement learning (MARL) requires a large number of coordinated online interactions with the environment during training. When only offline data is available, a direct application of online MARL schemes would generally fail due to the epistemic uncertainty entailed by the lack of exploration during training. In this work, we propose an offline MARL scheme that integrates distributional RL and conservative Q-learning to address the environment’s inherent aleatoric uncertainty and the epistemic uncertainty arising from the use of offline data. We explore both independent and joint learning strategies. The proposed MARL scheme, referred to as multi-agent conservative quantile regression, addresses general risk-sensitive design criteria and is applied to the trajectory planning problem in drone networks, showcasing its advantages.
Reinforcement learning (RL) has been widely adopted for controlling and optimizing complex engineering systems such as next-generation wireless networks. An important challenge in adopting RL is the need for direct access to the physical environment. This limitation is particularly severe in multi-agent systems, for which conventional multi-agent reinforcement learning (MARL) requires a large number of coordinated online interactions with the environment during training. When only offline data is available, a direct application of online MARL schemes would generally fail due to the epistemic uncertainty entailed by the lack of exploration during training. In this work, we propose an offline MARL scheme that integrates distributional RL and conservative Q-learning to address the environment’s inherent aleatoric uncertainty and the epistemic uncertainty arising from the use of offline data. We explore both independent and joint learning strategies. The proposed MARL scheme, referred to as multi-agent conservative quantile regression, addresses general risk-sensitive design criteria and is applied to the trajectory planning problem in drone networks, showcasing its advantages.
Kokoelmat
- Avoin saatavuus [38840]