>> 自然科学版期刊 >> 2022 >> 2022年05期 >> 正文
基于双通道卷积注意力网络的语音增强方法
时间: 2022-09-10 次数:

李辉1, 景浩2, 严康华2, 邹波蓉1, 侯庆华1, 武会斌1,.基于双通道卷积注意力网络的语音增强方法[J].河南理工大学学报(自然科学版),2022,41(5):127-136.

LI Hui1, JING Hao2, YAN Kanghua2, ZOU Borong1, HOU Qinghua1, WU Huibin1,et al.Speech enhancement method based on dual-channel convolutional attention network[J].Journal of Henan Polytechnic University(Natural Science) ,2022,41(5):127-136.

基于双通道卷积注意力网络的语音增强方法

李辉1, 景浩2, 严康华2, 邹波蓉1, 侯庆华1, 武会斌1

1.河南理工大学 物理与电子信息学院,河南 焦作  454000<br/>2.河南理工大学 电气工程与自动化学院,河南 焦作  454000

摘要:传统的单通道网络模型因表征能力有限,无法充分提取语音深层特征,导致模型的语音增强效果不明显。鉴于此,提出一种双通道卷积注意力网络的语音增强方法。首先,使用卷积神经网络和长短时记忆网络构建并行的双通道学习模块,结合两种不同神经网络的优势,充分挖掘语音的深层特征;其次,在两个通道中分别添加注意力模块,依照关注度对通道的输出特征进行加权,达到强调有益信息的目的;最后,将两个通道的输出进行融合得到增强特征。结果表明,在低信噪比和非平稳噪声环境中,包含双通道结构和注意力模块的模型,其增强效果明显优于其他对比模型,有效提高了增强语音的质量和可懂度,验证了所提模型的可行性。

关键词:语音增强;卷积神经网络;长短时记忆网络;双通道学习模块;注意力模块

doi:10.16186/j.cnki.1673-9787.2020060014

基金项目:国家自然科学基金资助项目(62101176);河南省重点研发与推广专项(科技攻关)项目(222102210247

收稿日期:2020/06/04

修回日期:2021/12/06

出版日期:2022/09/25

Speech enhancement method based on dual-channel convolutional attention network

LI Hui1, JING Hao2, YAN Kanghua2, ZOU Borong1, HOU Qinghua1, WU Huibin1

1.School of Physics & Electronic Information EngineeringHenan Polytechnic UniversityJiaozuo  454000HenanChina<br/>2.School of Electrical Engineering and AutomationHenan Polytechnic UniversityJiaozuo  454000HenanChina

Abstract:The traditional single-channel network model is unable to fully extract the deep features of speech due to its limited representation abilityresulting in insignificant enhancement effect.In view of thisa dual-channel convolutional attention network speech enhancement method was proposed.Firstlythe convolutional neural network and long short-term memory network were used to construct a parallel dual-channel learning module.The dual-channel learning module could combine the advantages of the two different neural networks to fully explore the deep features of speech.Secondlyattention module was added in each channel to weight the output features of the channel according to the degree of attention.Finallyenhanced features were obtained by fusing the output of the two channels.Experimental results showed thatin low SNR and non-stationary noise environmentthe enhanced effect of the model including dual-channel structure and attention module was obviously better than other contrast modelswhich effectively improved the quality and intelligibility of enhanced speechand further confirmed the feasibility of the proposed model.

Key words:speech enhancement;convolutional neural network;long short-term memory network;dual-channel learning module;attention module

  016_2020060014_李辉_H.pdf

最近更新