Time: 2025-10-14 | Counts: |
ZHAO S, JIA Z P, ZHU X L,et al. Dynamic offloading cost optimization of Internet of vehicles based on deep reinforcement learning[J].Journal of Henan Polytechnic University(Natural Science) ,2025,44(6):191-200.
doi:10.16186/j.cnki.1673-9787.2023020018
Received: 2023/02/07
Revised: 2023/07/28
Published: 2025/10/14
Dynamic offloading cost optimization of Internet of vehicles based on deep reinforcement learning
Zhao Shan, Jia Zongpu, Zhu Xiaoli, Pang Xiaoyan, Gu Kunyuan
School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454000, Henan, China
Abstract: Objectives The study aimed to solve the key problems of task offloading and resource allocation in the Internet of vehicles with imperfect channel, and reduce the computational cost. Methods Combined the imperfect channel characteristics to abstract the basic vehicle-connected task offload environment, jointly optimized the task offload ratio, power selection and server resource allocation, and established a long-term average cost minimization problem model for all users. Using a dynamic offloading optimization scheme based on deep reinforcement learning, and considering the continuity of solution variables, SP-DDPG (deep deterministic policy gradient with importance sampling and prioritized experience replay) algorithm was proposed to solve the problem model.Compared with some existing deep reinforcement learning methods, the performance of SP-DDPG algorithm under the influence of a single variable was studied, and two important indicators of average offloading cost and task discard number were calculated respectively. Results Compared with the complete task offloading algorithm F-DDPG and DDQN, the average task offloading cost was reduced by about 36.13% and 44.02%, and the number of dropped tasks was reduced by at least 4.38% and 9.76% respectively. Compared with the partial offloading algorithm DDPG, the average offloading cost and the number of dropped tasks decreased by 13.34% and 3.17%. The experimental results were averaged after multiple runs (tradeoff factor of delay and energy consumption ω=0.5, channel estimation accuracy ρ=0.95), which had good reliability. Conclusions Compared with some conventional deep reinforcement learning algorithms, the proposed optimized deep deterministic policy gradient algorithm SP-DDPG had lower computational cost and better performance in the environment of vehicle networking with unstable and complex changes.
Key words: Internet of vehicles; partial offloading; resource allocation; deep deterministic policy gradient; imperfect channel