SOQ: Structural Reinforcement Learning for Constrained Delay Minimization With Channel State Information

Citations

WEB OF SCIENCE

5
Citations

SCOPUS

4

초록

The goal of this study is to minimize the average delay under the average energy consumption constraint in a single-queue and single-server wireless communication system with block fading channels. To this end, we formulate this problem as an infinite-horizon-constrained Markov decision process (CMDP). In our CMDP, we jointly consider the queue length and channel condition as the state. We apply the Lagrange multiplier method to transform the constrained optimization problem into an unconstrained optimization problem. Then, we prove that an optimal scheduling strategy is nondecreasing with respect to queue length and channel state. To obtain an optimal scheduling policy, an efficient reinforcement learning algorithm, the Structural-Optimistic Q -learning algorithm (SOQ), is proposed, which exploits the nondecreasing property of optimal policies by using policy projection. Finally, we analyze how to control the average energy consumption to satisfy a given energy consumption constraint. The simulation results show that the performance of the SOQ surpasses that of the traditional Q -learning algorithm in terms of the average cost during the learning phase.

키워드

DelaysOptimal schedulingInternet of ThingsEnergy consumptionCommunication systemsWireless communicationMarkov processesConstrained optimizationcross-layer designinfinite-horizon Markov decision process (MDP)Internet of Things (IoT)lagrange multiplier methodreinforcement learning (RL)upper confidence bound (UCB)ENERGY EFFICIENCYFADING CHANNELSLOW-LATENCYTRANSMISSIONCOMMUNICATIONNETWORKSPOLICIESAWARE
제목
SOQ: Structural Reinforcement Learning for Constrained Delay Minimization With Channel State Information
저자
Zhao, YuKim, YeongjinLee, Joohyun
DOI
10.1109/JIOT.2023.3299598
발행일
2024-02-01
유형
Article
저널명
IEEE Internet of Things Journal
11
3
페이지
4628 ~ 4644