Deep Reinforcement Learning for Orchestrating Cost-Aware Reconfigurations of vRANs
Murti, Fahri Wisnu; Ali, Samad; Iosifidis, George; Latva-aho, Matti (2023-07-05)
Murti, Fahri Wisnu
Ali, Samad
Iosifidis, George
Latva-aho, Matti
IEEE
05.07.2023
F. W. Murti, S. Ali, G. Iosifidis and M. Latva-aho, "Deep Reinforcement Learning for Orchestrating Cost-Aware Reconfigurations of vRANs," in IEEE Transactions on Network and Service Management, vol. 21, no. 1, pp. 200-216, Feb. 2024, doi: 10.1109/TNSM.2023.3292713
https://creativecommons.org/licenses/by/4.0/
© 2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
https://creativecommons.org/licenses/by/4.0/
© 2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
https://creativecommons.org/licenses/by/4.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202403252411
https://urn.fi/URN:NBN:fi:oulu-202403252411
Tiivistelmä
Abstract
Virtualized Radio Access Networks (vRANs) are fully configurable and can be implemented at a low cost over commodity platforms to enable network management flexibility. In this paper, a novel vRAN reconfiguration problem is formulated to jointly reconfigure the functional splits of the base stations (BSs), locations of the virtualized central units (vCUs) and distributed units (vDUs), their resources, and the routing for each BS data flow. The objective is to minimize the long-term total network operation cost while adapting to the varying traffic demands and resource availability. In the first step, testbed measurements are performed to study the relationship between the traffic demands and computing resources, which reveals high variance and depends on the platform and its load. Consequently, finding the perfect model of the underlying system is non-trivial. Therefore, to solve the proposed problem, a deep reinforcement learning (RL)-based framework is proposed and developed using model-free RL approaches. Moreover, the problem consists of multiple BSs sharing the same resources, which results in a multi-dimensional discrete action space and leads to a combinatorial number of possible actions. To overcome this curse of dimensionality, action branching architecture, which is an action decomposition method with a shared decision module followed by neural network is combined with Dueling Double Deep Q-network (D3QN) algorithm. Simulations are carried out using an O-RAN compliant model and real traces of the testbed. Our numerical results show that the proposed framework successfully learns the optimal policy that adaptively selects the vRAN configurations, where its learning convergence can be further expedited through transfer learning even in different vRAN systems. It also offers significant cost savings by up to 59% of a static benchmark, 35% of Deep Deterministic Policy Gradient with discretization, and 76% of non-branching D3QN.
Virtualized Radio Access Networks (vRANs) are fully configurable and can be implemented at a low cost over commodity platforms to enable network management flexibility. In this paper, a novel vRAN reconfiguration problem is formulated to jointly reconfigure the functional splits of the base stations (BSs), locations of the virtualized central units (vCUs) and distributed units (vDUs), their resources, and the routing for each BS data flow. The objective is to minimize the long-term total network operation cost while adapting to the varying traffic demands and resource availability. In the first step, testbed measurements are performed to study the relationship between the traffic demands and computing resources, which reveals high variance and depends on the platform and its load. Consequently, finding the perfect model of the underlying system is non-trivial. Therefore, to solve the proposed problem, a deep reinforcement learning (RL)-based framework is proposed and developed using model-free RL approaches. Moreover, the problem consists of multiple BSs sharing the same resources, which results in a multi-dimensional discrete action space and leads to a combinatorial number of possible actions. To overcome this curse of dimensionality, action branching architecture, which is an action decomposition method with a shared decision module followed by neural network is combined with Dueling Double Deep Q-network (D3QN) algorithm. Simulations are carried out using an O-RAN compliant model and real traces of the testbed. Our numerical results show that the proposed framework successfully learns the optimal policy that adaptively selects the vRAN configurations, where its learning convergence can be further expedited through transfer learning even in different vRAN systems. It also offers significant cost savings by up to 59% of a static benchmark, 35% of Deep Deterministic Policy Gradient with discretization, and 76% of non-branching D3QN.
Kokoelmat
- Avoin saatavuus [37798]