Author: ZHAO Shan, TIAN Kaiwen, SUN Junding | Time: 2024-09-24 | Counts: |
ZHAO S, TIAN K W, SUN J D,et al.Rich semantic extractor network for real-time semantic segmentation[J].Journal of Henan Polytechnic University(Natural Science) ,2024,43(6):146-155.
doi:10.16186/j.cnki.1673-9787.2023030005
Received:2023/03/02
Revised:2023/05/14
Published:2024-09-24
Rich semantic extractor network for real-time semantic segmentation
ZHAO Shan1, TIAN Kaiwen1, SUN Junding2
1.School of Software,Henan Polytechnic University,Jiaozuo 454000,Henan,China;2.School of Computer Science and Technology,Henan Polytechnic University,Jiaozuo 454000,Henan,China
Abstract: Objectives The inference speed of the real-time semantic segmentation network is limited,the depth of the network is shallow,which lead to insufficient semantic feature information extracted.Additionally,the shallow network depth restricts the capability of feature extraction networks,reducing their robustness and adaptability.In order to solve such the problems, Methods a rich semantic extractor network(RSENet) for real-time semantic segmentation was proposed.Firstly,aiming at the problem of inadequate semantic feature information extraction,a rich semantic extractor(RSE) was introduced,which included a multi-scale global semantic extraction module(MGSEM) and a semantic fusion module(SFM).MGSEM was used to extract rich multi-scale global semantics and expand the effective receptive field of the network.At the same time,SFM efficiently fused multi-scale local semantics and multi-scale global semantics,so that the network had more comprehensive and rich semantic information.Finally,according to the characteristics of the detailed branch and the semantic branch,a space reconstruction aggregation module(SRAM) was designed to model the context information of the detailed features and enhanced the feature representation,so that the two branches could be efficiently aggregated. Results Comprehensive experiments were conducted on Cityscapes and ADE20K datasets,and the proposed RSENet achieved mIoU of 75.6% and 35.7% at inference speed of 76 frames/s and 67 frames/s,respectively. Conclusions The experimental results suggested that in the extraction of semantic information within complex scenes,the network proposed in this paper was able to deeply explore and accurately capture such semantic information in images.Furthermore,outstanding performance was demonstrated in achieving a balance between accuracy and speed,with the network not only capable of achieving high-precision semantic segmentation but also exhibiting very fast inference speeds.This efficient image segmentation capability endowed the network with high practicality and operability in real-world application scenarios.
Key words:semantic segmentation;multi-scale feature;vision Transformer;feature fusion