>> 自然科学版 >> 当期目录 >> 机电•智能化 >> 正文
基于深度强化学习的车联网动态卸载成本优化
时间: 2025-10-14 次数:

赵珊, 贾宗璞, 朱小丽,等.基于深度强化学习的车联网动态卸载成本优化[J].河南理工大学学报(自然科学版),2025,44(6):191-200.

ZHAO S, JIA Z P, ZHU X L,et al. Dynamic offloading cost optimization of Internet of vehicles based on deep reinforcement learning[J].Journal of Henan Polytechnic University(Natural Science) ,2025,44(6):191-200.

基于深度强化学习的车联网动态卸载成本优化

赵珊, 贾宗璞, 朱小丽, 庞晓艳, 谷坤源

河南理工大学 计算机学院,河南 焦作 454000

摘要: 目的 为解决不完美信道车联网中任务卸载与资源分配的关键问题,降低计算成本,  方法 结合不完美信道特征对基础的车联网任务卸载环境抽象化,联合优化任务卸载比、功率选择和服务器资源分配,建立所有用户的长期平均成本最小化问题模型。采用基于深度强化学习的动态卸载优化方案,同时考虑求解变量的连续性,提出优化的深度确定性策略梯度算法SP-DDPG(deep deterministic policy gradient with importance sampling and prioritized experience replay)求解问题模型。对比现有的一些深度强化学习方法,研究单一变量影响下SP-DDPG算法的运行表现,分别计算平均卸载成本和任务丢弃数2个重要指标。  结果 所提算法与所设置的完全任务卸载算法F-DDPG与DDQN算法相比,任务平均卸载成本分别降低了约36.13%和44.02%,任务丢弃数至少下降了4.38%和9.76%;与部分卸载算法DDPG相比,任务平均卸载成本和任务丢弃数分别下降13.34%和3.17%。实验结果取多次运行后的平均值(时延及能耗权衡因子ω=0.5,信道估计精度值ρ=0.95),具有较好可靠性。  结论 在复杂变化的不稳定车联网环境中,所提优化深度确定性策略梯度算法SP-DDPG,相较几种常规的深度强化学习算法任务计算成本更低,任务处理效果更好。

关键词:车联网;部分卸载;资源分配;深度确定性策略梯度;不完美信道

doi:10.16186/j.cnki.1673-9787.2023020018

基金项目:国家自然科学基金资助项目(62276092);河南省高校青年骨干教师资助计划项目(2019GGJS061)

收稿日期:2023/02/07

修回日期:2023/07/28

出版日期:2025/10/14

Dynamic offloading cost optimization of Internet of vehicles based on deep reinforcement learning

Zhao Shan, Jia Zongpu, Zhu Xiaoli, Pang Xiaoyan, Gu Kunyuan

School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454000, Henan, China

Abstract: Objectives The study aimed to solve the key problems of task offloading and resource allocation in the Internet of vehicles with imperfect channel, and reduce the computational cost. Methods Combined the imperfect channel characteristics to abstract the basic vehicle-connected task offload environment, jointly optimized the task offload ratio, power selection and server resource allocation, and established a long-term average cost minimization problem model for all users. Using a dynamic offloading optimization scheme based on deep reinforcement learning, and considering the continuity of solution variables, SP-DDPG (deep deterministic policy gradient with importance sampling and prioritized experience replay) algorithm was proposed to solve the problem model.Compared with some existing deep reinforcement learning methods, the performance of SP-DDPG algorithm under the influence of a single variable was studied, and two important indicators of average offloading cost and task discard number were calculated respectively.  Results Compared with the complete task offloading algorithm F-DDPG and DDQN, the average task offloading cost was reduced by about 36.13% and 44.02%, and the number of dropped tasks was reduced by at least 4.38% and 9.76% respectively. Compared with the partial offloading algorithm DDPG, the average offloading cost and the number of dropped tasks decreased by 13.34% and 3.17%. The experimental results were averaged after multiple runs (tradeoff factor of delay and energy consumption ω=0.5, channel estimation accuracy ρ=0.95), which had good reliability.  Conclusions Compared with some conventional deep reinforcement learning algorithms, the proposed optimized deep deterministic policy gradient algorithm SP-DDPG had lower computational cost and better performance in the environment of vehicle networking with unstable and complex changes.

Key words: Internet of vehicles; partial offloading; resource allocation; deep deterministic policy gradient; imperfect channel

最近更新