>> 自然科学版期刊 >> 2019年05期 >> 正文
一种多策略结合的地址匹配算法
供稿: 吴睿;龙华;熊新;彭艺 时间: 2019-10-23 次数:

作者:吴睿龙华熊新彭艺

作者单位:昆明理工大学信息工程与自动化学院

摘要:针对现有的地址匹配算法地址要素切分存在歧义、匹配率和准确率低等问题,提出一种多策略结合的地址匹配算法。利用双向最大匹配分词算法提取有歧义的地址要素,通过建立地址要素特征字词典与地址标准数据库,对歧义结果进行首次歧义消除,再利用基于序列标注的中文分词进行二次歧义消除,将得到的各地址要素匹配数据库后计算相似性匹配得分,最后按照各地址要素的重要程度分配权重,加权求和后得到匹配总得分。结果表明,该算法优于其他传统的地址匹配算法,提高了地址匹配的匹配率与准确率。

基金:国家自然科学基金资助项目(61761025);

关键词:多策略;地址匹配;序列标注;权重;匹配得分;

DOI:10.16186/j.cnki.1673-9787.2019.5.18

分类号:TP391.1

A multi-strategy combined address matching algorithm

WU RuiLONG HuaXIONG XinPENG Yi

Faculty of Information Engineering and Automation, Kunming University of Science and Technology

Abstract:In order to solve the problems such as the ambiguity of address element segmentation, the low matching rate and accuracy in available address matching algorithms, a multi-strategy combined address matching algorithm (MSC) was proposed.The bidirectional maximal matching participle algorithm was used to extract ambiguous address elements in MSC.Firstly, the ambiguous results were disambiguated by establishing the address element feature word dictionary and address standard database for the first time.Then, the Chinese word segmentation based on sequence labeling was used for secondary ambiguity elimination.Meanwhile, the similarity matching score was calculated after each address element matched with the database.Finally, the weights were assigned according to the importance of each address element, as the total score was obtained after the weighted summation.The experimental results showed that the MSC improved the matching rate and accuracy of address matching which was superior to other traditional address matching algorithms.

最近更新