Meta-Offline and Distributional Multi-Agent RL for Risk-Aware Decision-Making
Eldeeb, Eslam; Alves, Hirley (2026-04-21)
Eldeeb, Eslam
Alves, Hirley
IEEE
21.04.2026
E. Eldeeb and H. Alves, "Meta-Offline and Distributional Multi-Agent RL for Risk-Aware Decision-Making," ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2026, pp. 20531-20535, doi: 10.1109/ICASSP55912.2026.11463052.
https://rightsstatements.org/vocab/InC/1.0/
© 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
© 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202604232742
https://urn.fi/URN:NBN:fi:oulu-202604232742
Tiivistelmä
Abstract:
Mission critical applications, such as UAV-assisted IoT networks require risk-aware decision-making under dynamic topologies and uncertain channels. We propose meta-conservative quantile regression (M-CQR), a meta-offline distributional MARL algorithm that integrates conservative Q-learning (CQL) for safe offline learning, quantile regression DQN (QR-DQN) for risk-sensitive value estimation, and model-agnostic meta-learning (MAML) for rapid adaptation. Two variants are developed: meta-independent CQR (M-I-CQR) and meta-CTDE-CQR. In a UAV-based communication scenario, M-CTDE-CQR achieves up to 50% faster convergence and outperforms baseline MARL methods, offering improved scalability, robustness, and adaptability for risk-sensitive decision-making. Code is available at https://github.com/Eslam211/MA_Meta_ODRL
Mission critical applications, such as UAV-assisted IoT networks require risk-aware decision-making under dynamic topologies and uncertain channels. We propose meta-conservative quantile regression (M-CQR), a meta-offline distributional MARL algorithm that integrates conservative Q-learning (CQL) for safe offline learning, quantile regression DQN (QR-DQN) for risk-sensitive value estimation, and model-agnostic meta-learning (MAML) for rapid adaptation. Two variants are developed: meta-independent CQR (M-I-CQR) and meta-CTDE-CQR. In a UAV-based communication scenario, M-CTDE-CQR achieves up to 50% faster convergence and outperforms baseline MARL methods, offering improved scalability, robustness, and adaptability for risk-sensitive decision-making. Code is available at https://github.com/Eslam211/MA_Meta_ODRL
Kokoelmat
- Avoin saatavuus [43406]
