>> 自然科学版期刊 >> 2023 >> 2023年06期 >> 正文
基于卷积神经网络和Transformer的视频行人再识别
时间: 2023-11-10 次数:

赵彦如, 牛东杰, 孙东红,等.基于卷积神经网络和Transformer的视频行人再识别[J].河南理工大学学报(自然科学版),2023,42(6):149-156.

ZHAO Y R, NIU D J, SUN D H, et al.Video person re-identification based on convolutional neural network and Transformer[J].Journal of Henan Polytechnic University(Natural Science) ,2023,42(6):149-156.

基于卷积神经网络和Transformer的视频行人再识别

赵彦如, 牛东杰, 孙东红, 杨蕙萌

河南理工大学 机械与动力工程学院,河南 焦作 454000

摘要:为了解决视频行人再识别领域仅使用卷积神经网络进行行人特征提取效果不佳的问题,提出一种基于卷积神经网络和TransformerResTNetResNet and Transformer network)网络模型。ResTNet利用ResNet50网络得到局部特征,令中间层输出作为Transformer的先验知识输入。Transformer分支中不断缩小特征图尺寸,扩大感受野,充分挖掘局部特征之间的关系,生成行人的全局特征,同时利用移位窗口方法减少模型计算量。在大规模MARS数据集上,Rank-1mAP分别达到86.8%80.3%,比基准分别增加了3.8%3.3%,在2个小规模数据集上也取得了良好效果。在几大数据集上的大量实验表明,本文方法能增强行人识别的鲁棒性,有效提高行人再识别的准确率。

关键词:视频行人再识别;卷积神经网络;Transformer;局部特征;全局特征

doi:10.16186/j.cnki.1673-9787.2021120013

基金项目:国家自然科学基金资助项目(51505133);河南省科技攻关计划项目(212102210316);河南理工大学光电传感与智能测控河南省工程实验室开放课题(HELPSIMC-2020-006

收稿日期:2021/12/03

修回日期:2022/02/25

出版日期:2023/11/25

Video person re-identification based on convolutional neural network and Transformer

ZHAO Yanru, NIU Dongjie, SUN Donghong, YANG Huimeng

School of Mechanical and Power EngineeringHenan Polytechnic UniversityJiaozuo 454000HenanChina

Abstract:To solve the problem of poor effect of person feature extraction using only convolutional neural network in the field of video person re-identificationa network model ResTNet ResNet and Transformer Network based on convolutional neural network and Transformer was proposed.ResNet50 network was used to obtain local features and the output of its middle layer was input to Transformer as prior knowledge in ResTNet.In the Transformer branchthe size of the feature map was continuously reducedthe field of perception was expandedand the relationship among local features was fully explored to generate the global features of pedestrianswhile the model computation was decreased with the shift window method.The Rank-1 and mAP on the large-scale MARS dataset reached 86.8% and 80.3%respectivelywhich were 3.8% and 3.3% higher than the benchmark.Meanwhileexcellent performance was also achieved on the two small-scale datasets.In this papernot only the Transformer model was successfully applied to the field of video person re-identificationbut also extensive experiments on several large datasets showed that the proposed ResTNet network could enhance the robustness of the recognition and improve the accuracy of person re-identification effectively.

Key words:video person re-identification;convolutional neural network;Transformer;local feature;global feature

  019_2021120013_赵彦如_L.pdf

最近更新