nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2021, 03, v.37 1-8
基于BiLSTM-CRF与分类分层标注的微博中突发事件时空信息精细识别方法
基金项目(Foundation): 国家自然科学基金项目(41561084、41201409); 中国博士后科学基金项目(2018M632991)
邮箱(Email): hulieyun@126.com;
DOI:
摘要:

针对现有方法从微博中识别时空信息精度较低且相对粗略的问题,该文提出基于双向长短期记忆网络和条件随机场(BiLSTM-CRF)与分类分层标注的微博中突发事件时空信息精细识别方法(MFISIE)。首先,设计一套适用于微博中突发事件的分类分层时空信息标注体系(CHSIAS),构建微博语料库;然后,结合BiLSTM-CRF构建微博时空信息识别模型,并利用117 567条标注的微博语料进行实验。结果表明:与人民日报语料标注体系相比,CHSIAS与CRF、BiLSTM、BiLSTM-CRF 3种方法结合进行时空信息识别,F值均较高,且可获得多层级、精细化的突发事件时空信息,其中,基于BiLSTM-CRF的MFISIE方法的F值(91.2%)最高。使用CHSIAS时,BiLSTM-CRF对时间信息的识别效果最好,其对兴趣点、建筑物和相对位置描述识别结果的F值比BiLSTM方法分别提升了8.8%、6.3%和12.3%,比CRF方法分别提升了7.1%、7.7%和8.9%。MFISIE可更精确地提取微博中突发事件的时空信息,为突发事件应急信息的快速感知与精准应用提供技术支撑。

Abstract:

Weibo has the merits of being free, massive and up to date.It has become a new data source for obtaining emergency information.Timely collecting the information related to emergencies from Weibo helps decision makers to make decisions immediately and distribute emergency resources reasonably.Therefore, it is of great significance to study the methods of accurately and effectively identifying emergency information from Weibo.Moreover the spatio-temporal information is the key information for describing emergencies.For the problems of low precision and rough spatio-temporal information identified from Weibo by the existing methods, this paper proposes the MFISIE method for identifying fine spatio-temporal information of emergencies from Weibo.First, this paper designs a set of classified-hierarchical spatio-temporal information annotation system(CHSIAS) that is suitable for emergency information from Weibo, and constructs the Weibo emergencies corpus.Then, for identifying fine spatio-temporal information from Weibo, it builds the Weibo′s spatio-temporal information identification model combining with the bidirectional long-short term memory and conditional random field(BiLSTM-CRF).Finally, it carries out the experiments by using 117 567 lines of labeled Weibo corpus.Compared with the corpus annotation system of People′s Daily, the experimental results demonstrate that CHSIAS can obtain a higher F-measure and obtain multi-levels and refined spatio-temporal information of emergencies when it is combined with CRF,BiLSTM and BiLSTM-CRF respectively.Moreover, the F-measure of MFISIE is the highest, reaching 91.2%.MFISIE has the best performance of recognizing the time information, and for the position information, the F-measure of identifying POI,building and relative position based on MFISIE are 8.8%,6.3% and 12.3% higher than the BiLSTM-based results respectively, also 7.1%,7.7% and 8.9% higher than the CRF-based results respectively.Obviously, the MFISIE can effectively and accurately extract the fine spatio-temporal information of emergencies from Weibo, and can provide better technical support for the rapid perception and precise application of urban emergency information.

参考文献

[1] 王艳东,李昊,王腾,等.基于社交媒体的突发事件应急信息挖掘与分析[J].武汉大学学报(信息科学版),2016,41(3):290-297.

[2] 王海起,陈冉,魏世清,等.利用中文微博数据的地理情感特征挖掘[J].武汉大学学报(信息科学版),2020,45(5):699-708.

[3] 梁春阳.基于社交媒体的台风灾情信息抽取方法研究[D].福州:福建师范大学,2019.

[4] 张春菊,张雪英,李明,等.中文文本中时间信息解析方法[J].地理与地理信息科学,2014,30(6):1-7.

[5] 张春菊,张雪英,王曙,等.中文文本的事件时空信息标注[J].中文信息学报,2016,30(3):213-222.

[6] 刘玉娇,琚生根,李若晨,等.基于深度学习的中文微博命名实体识别[J].四川大学学报(工程科学版),2016,48(S2):142-146.

[7] 邱泉清,苗夺谦,张志飞.中文微博命名实体识别[J].计算机科学,2013,40(6):196-198.

[8] 李刚,黄永峰.一种面向微博文本的命名实体识别方法[J].计算机技术与应用,2018,44(1):118-120.

[9] 熊佳茜.基于CRF的中文微博交通信息事件抽取[D].上海:上海交通大学,2014.

[10] 王克永,刘纪平,罗安,等.前后缀与特征词相结合的地名地址提取[J].测绘通报,2016,62(2):64-68.

[11] 宋国民,张三强,贾奋励,等.中文文本中时间信息抽取及规范化方法[J].测绘科学技术学报,2019,36(5):538-544.

[12] 俞鸿魁,张华平,刘群,等.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,2:87-94.

[13] PENG N Y,DREDZE M.Named entity recognition for Chinese social media with jointly trained embeddings[A].Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing[C].2015.548-554.

[14] 朱莎莎,刘宗田,付剑锋,等.基于条件随机场的中文时间短语识别[J].计算机工程,2011,37(15):164-167.

[15] 刘冰洋,伍大勇,刘欣然,等.融合全局词语边界特征的中文命名实体识别方法[J].中文信息学报,2017,31(2):86-91.

[16] CHANG C H,HUANG C Y,SU Y S.On Chinese postal address and associated information extraction[A].The 26th Annual Conference of the Japanese Society for Artificial Intelligence[C].2012.1-7.

[17] 邬伦,刘磊,李浩然,等.基于条件随机场的中文地名识别方法[J].武汉大学学报(信息科学版),2017,42(2):150-156.

[18] 孙虹,陈俊杰.双层CRF与规则相结合的中文地名识别方法研究[J].计算机应用与软件,2014,31(11):175-177.

[19] 吴琼,黄德根.基于条件随机场与时间词库的中文时间表达式识别[J].中文信息学报,2014,28(6):169-174.

[20] 毛波,滕炜.基于条件随机场与规则改进的复杂中文地名识别[J].武汉大学学报(工学版),2020,53(5):447-454.

[21] 王国昱.基于深度学习的中文命名实体识别研究[D].北京:北京工业大学,2015.

[22] HUANG Z H,XU W,YU K.Bidirectional LSTM-CRF models for sequence tagging[EB/OL].https://arxiv.org/abs/1508.01991,2015-08-09.

[23] 顾溢.基于BiLSTM-CRF的复杂中文命名实体识别研究[D].南京:南京大学,2019.

[24] 殷章志,李欣子,黄德根,等.融合字词模型的中文命名实体识别研究[J].中文信息学报,2019,33(11):95-100.

[25] 付剑锋.面向事件的知识处理研究[D].上海:上海大学,2010.

[26] 廖先桃.中文命名实体识别方法研究[D].哈尔滨:哈尔滨工业大学,2006.

[27] VERHAGEN M,SAURí R,CASELLI T,et al.SemEval-2010 task 13:TempEval-2[A].Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval′10)[C].2010.57-62.

[28] 中华人民共和国国家质量监督检验检疫总局中国国家标准化管理委员会.数字城市地理信息公共平台地名/地址编码规则(GB/T 23705-2009)[S].北京:中国标准出版社,2009.

[29] LAFFERTY J,MCCALLUM A,PEREIRA F C N.Conditional random fields:Probabilistic models for segmenting and labelling sequence data[A].Proceedings of the 18th International Conference on Machine Learning[C].2001.282-289.

[30] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.

[31] SCHUSTER M,PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing,2002,45(11):2673-2681.

[32] 许也,申柏希,徐翔,等.基于条件随机场的非规范化中文地址解析方法[J].地理与地理信息科学,2019,35(2):12-18.

基本信息:

中图分类号:TP391.1

引用信息:

[1]吴建华,胡烈云,赵宇,等.基于BiLSTM-CRF与分类分层标注的微博中突发事件时空信息精细识别方法[J].地理与地理信息科学,2021,37(03):1-8.

基金信息:

国家自然科学基金项目(41561084、41201409); 中国博士后科学基金项目(2018M632991)

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文