地理与地理信息科学

2021, v.37(04) 10-15

[打印本页] [关闭]
本期目录(Current Issue) | 过刊浏览(Past Issue) | 高级检索(Advanced Search)

基于BERT-BiLSTM-CRF的中文地址解析方法
A Chinese Address Resolution Method Based on BERT-BiLSTM-CRF

吴恪涵;张雪英;叶鹏;怀安;张航;
WU Ke-han;ZHANG Xue-ying;YE Peng;HUAI An;ZHANG Hang;Key Laboratory of Urban Land Resources Monitoring and Simulation,MNR;Key Laboratory of Virtual Geographic Environment of Ministry of Education,Nanjing Normal University,Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application;

摘要(Abstract):

中文地址解析是地址匹配的重要环节,广泛应用于地址检索、地理编码和地址信息识别等方面。但传统地址解析方法存在覆盖度有限、人工参与过多和泛化能力较差等问题。为发挥深度学习模型在深层结构上自动学习上下文特征的优势,提出一种基于BERT-BiLSTM-CRF深度学习模型的中文地址解析方法:依据中文地址要素多级分类体系,扩展BIOES标注方法并进行地址语料标注;基于预训练语言模型,构建融合BERT、BiLSTM和CRF的综合深度学习模型,通过BERT预训练语言模型获取富含语义信息的字符向量,弥补静态词向量特异性缺失的问题,提高复杂地址要素的提取能力。以2019年深圳市地址数据为例进行模型性能评估,该方法对于多数中文地址要素的解析准确率达90%以上;相比IDCNN-CRF和BiLSTM-CRF等深度学习模型,该方法对只具有小规模地址语料时的地址解析效果更优,且在解析多种地址要素类型时能保持良好的性能。
Chinese address resolution is an important part of address matching, which is widely used in address retrieval, geocoding and address information recognition.However, previous address resolution methods have problems of limited coverage, excessive manual participation and poor generalization capabilities.In order to take advantage of the deep learning model to automatically learn context features in the deep structure, this paper propose a Chinese address resolution method based on the BERT-BiLSTM-CRF.In this method, the BIOES annotation method is extended according to the multi-level classification system of Chinese address elements, and performs the address corpus annotation.Then, based on the pre-trained language model, a comprehensive deep learning model that integrates BERT,BiLSTM and CRF is constructed.In particular, the BERT pre-trained language model could obtain character vectors with rich semantic information.Therefore, the character vectors can make up for the lack of specificity of static word vectors, which is helpful to improve the ability to extract complex elements in Chinese address.The experiment takes the Shenzhen address data in 2019 as an example to evaluate the performance of the proposed model.The resolution accuracy of the method for most types of Chinese address elements is more than 90%.The results show that compared with IDCNN-CRF and BiLSTM-CRF models, this model is more effective for small-scale address corpus, and can accurately resolute multiply types of address elements.

关键词(KeyWords): 中文地址;地址要素分类;地址标注;BERT-BiLSTM-CRF;地址解析模型
Chinese address;address element classification;address annotation;BERT-BiLSTM-CRF;address resolution model

Abstract:

Keywords:

基金项目(Foundation): 自然资源部城市国土资源监测与仿真重点实验室开放基金项目(KF-2019-04-025);; 国家自然科学基金项目(41631177);; 国家重点研发计划项目(2017YFB0503602)

作者(Author): 吴恪涵;张雪英;叶鹏;怀安;张航;
WU Ke-han;ZHANG Xue-ying;YE Peng;HUAI An;ZHANG Hang;Key Laboratory of Urban Land Resources Monitoring and Simulation,MNR;Key Laboratory of Virtual Geographic Environment of Ministry of Education,Nanjing Normal University,Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application;

Email:

DOI:

参考文献(References):

扩展功能
本文信息
服务与反馈
本文关键词相关文章
本文作者相关文章
中国知网
分享