供稿: 吴睿;龙华;熊新;彭艺 | 时间: 2019-10-23 | 次数: |
作者单位:昆明理工大学信息工程与自动化学院
摘要:针对现有的地址匹配算法地址要素切分存在歧义、匹配率和准确率低等问题,提出一种多策略结合的地址匹配算法。利用双向最大匹配分词算法提取有歧义的地址要素,通过建立地址要素特征字词典与地址标准数据库,对歧义结果进行首次歧义消除,再利用基于序列标注的中文分词进行二次歧义消除,将得到的各地址要素匹配数据库后计算相似性匹配得分,最后按照各地址要素的重要程度分配权重,加权求和后得到匹配总得分。结果表明,该算法优于其他传统的地址匹配算法,提高了地址匹配的匹配率与准确率。
DOI:10.16186/j.cnki.1673-9787.2019.5.18
分类号:TP391.1
A multi-strategy combined address matching algorithm
WU RuiLONG HuaXIONG XinPENG Yi
Faculty of Information Engineering and Automation, Kunming University of Science and Technology
Abstract:In order to solve the problems such as the ambiguity of address element segmentation, the low matching rate and accuracy in available address matching algorithms, a multi-strategy combined address matching algorithm (MSC) was proposed.The bidirectional maximal matching participle algorithm was used to extract ambiguous address elements in MSC.Firstly, the ambiguous results were disambiguated by establishing the address element feature word dictionary and address standard database for the first time.Then, the Chinese word segmentation based on sequence labeling was used for secondary ambiguity elimination.Meanwhile, the similarity matching score was calculated after each address element matched with the database.Finally, the weights were assigned according to the importance of each address element, as the total score was obtained after the weighted summation.The experimental results showed that the MSC improved the matching rate and accuracy of address matching which was superior to other traditional address matching algorithms.