[1]魏明飞,潘冀,陈志敏,等.预训练模型下航天情报实体识别方法[J].华侨大学学报(自然科学版),2021,42(6):831-837.[doi:10.11830/ISSN.1000-5013.202103038]
 WEI Mingfei,PAN Ji,CHEN Zhimin,et al.Aerospace Intelligence Entity Recognition Method Based on Pre-Training Model[J].Journal of Huaqiao University(Natural Science),2021,42(6):831-837.[doi:10.11830/ISSN.1000-5013.202103038]
点击复制

预训练模型下航天情报实体识别方法()
分享到:

《华侨大学学报(自然科学版)》[ISSN:1000-5013/CN:35-1079/N]

卷:
第42卷
期数:
2021年第6期
页码:
831-837
栏目:
出版日期:
2021-11-12

文章信息/Info

Title:
Aerospace Intelligence Entity Recognition Method Based on Pre-Training Model
文章编号:
1000-5013(2021)06-0831-07
作者:
魏明飞12 潘冀3 陈志敏12 梅小华4 石会鹏3
1. 中国科学院大学 计算机科学与技术学院, 北京 100049; 2. 国家空间科学中心, 北京 100190;3. 国家无线电监测中心, 北京 100037;4. 华侨大学 信息科学与工程学院, 福建 厦门 361021
Author(s):
WEI Mingfei12 PAN Ji3 CHEN Zhimin12MEi Xiaohua4 SHI Huipeng3
1. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; 2. National Space Science Center, Beijing 100190, China; 3. State Radio Monitoring Center, Beijing 100037, China; 4. College of Information Science and Engineering, Huaqiao University, Xiamen 361021, China
关键词:
航天情报处理 预训练 信息抽取 命名实体识别 信息科学
Keywords:
aerospace intelligence processing pre-training information extraction named entity recognition information science
分类号:
V19:G352;TP391
DOI:
10.11830/ISSN.1000-5013.202103038
文献标志码:
A
摘要:
为了快速处理航天情报,基于数据驱动的深度学习技术,提出融合多源异构知识标注中文航天情报数据集的方法流程,以及基于预训练(pre-training)模型的航天情报实体识别(AIER)方法;通过对航天情报进行命名实体识别,达到对航天情报进行信息抽取的目的.通过融合BERT(bidirectional encoder representation from transformers)预训练模型和条件随机场(CRF)模型构建AIER模型(BERT-CRF模型),将其与隐马尔可夫模型(HMM)、条件随机场(CRF)模型、双向长短期记忆网络加条件随机场(BiLSTM-CRF)模型进行实体识别对比实验.结果表明:基于预训练模型的AIER模型能够取得93.68%的准确率、97.56%的召回率和95.58%的F1值;相比于其他方法,基于预训练模型方法的性能得到提高.
Abstract:
In order to quickly process aerospace intelligence, based on a data-driven deeplearning technology, a method of fusing multi-source heterogeneous knowledge to label Chinese aerospace intelligence data sets is proposed, and the aerospace intelligence entity recognition(AIER)method based on pre-training models is formed. Through the identification of named entities for aerospace intelligence, the purpose of information extraction for aerospace intelligence is achieved. This paper aims to construct the AIER model(BERT-CRF model)by fusing the bidirectional encoder representations from transformers(BERT)pre-training model and the conditional random field(CRF)model, and combine it with the hidden Markov model(HMM)and CRF model, bidirectional long short-term memory network plus conditional random field(BiLSTM-CRF model)model for entity recognition contrast experiments. The results show that the AIER model based on the pre-training model can achieve 93.68% accuracy, 97.56% recall rate and 95.58% F1 value; compared with other methods, the pre-training model method is much improved on performance.

参考文献/References:

[1] 唐晓波,刘志源.金融领域文本序列标注与实体关系联合抽取研究[J].情报科学,2021,39(5):3-11.DOI:10.13833/j.issn.1007-7634.2021.05.001.
[2] 毛瑞彬,吕华揆,朱菁.上市公司公告篇章级信息抽取框架与实现[J].情报科学,2019,37(11):73-78,88.DOI:10.13833/j.issn.1007-7634.2019.11.012.
[3] 马奔,张璐.人工智能在金融领域的应用场景和现状分析[J].时代金融(上旬),2019(2):71-72.DOI:10.3969/j.issn.1672-8661(s).2019.02.031.
[4] 郑杜福,黄蔚,任祥辉.一种基于ERNIE的军事文本实体关系抽取模型[J].信息技术,2021(2):38-43.DOI:10.13274/j.cnki.hdzj.2021.02.007.
[5] 高翔,张金登,许潇,等.基于LSTM-CRF的军事动向文本实体识别方法[J].指挥信息系统与技术,2020,11(6):91-95.DOI:10.15908/j.cnki.cist.2020.06.017.
[6] KOCAMAN V,TALBY D.Biomedical named entity recognition at scale[C]//DEL BIMBO A,et al.International Conference on Pattern Recognition: Pattern Recognition.[S.l.]:Springer,2021:635-646.DOI:10.1007/978-3-030-68763-2_48.
[7] 刘宇瀚,刘常健,徐睿峰,等.结合字形特征与迭代学习的金融领域命名实体识别[J].中文信息学报,2020,34(11):74-83.DOI:10.3969/j.issn.1003-0077.2020.11.010.
[8] 蔡莉,王淑婷,刘俊晖,等.数据标注研究综述[J].软件学报,2020,31(2):302-320.DOI:10.13328/j.cnki.jos.005977.
[9] VINCZE V,SZARVAS G,FARKAS R,et al.The BioScope corpus:biomedical texts annotated for uncertainty,negation and their scopes[J].BMC bioinformatics,2008,9(11):1-9.DOI:10.1186/1471-2105-9-S11-S9.
[10] ZOU Bowei,ZHU Qiaoming,ZHOU Guodong.Negation and speculation identification in Chinese Language[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.Beijing:Association for Computational Linguistics,2015:656-665.DOI:10.3115/v1/P15-1064.
[11] 周惠巍,杨欢,徐俊利,等.中文模糊限制信息范围语料库的研究与构建[J].中文信息学报,2017,31(3):77-85.
[12] 冯鸾鸾,李军辉,李培峰,等.面向国防科技领域的技术和术语语料库构建方法[J].中文信息学报,2020,34(8):41-50.DOI:10.3969/j.issn.1003-0077.2020.08.006.
[13] MINTZ M,BILLS S,SNOW R,et al.Distant supervision for relation extraction without labeled data[C]//Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP.Singapore. Association for Computational Linguistics,2009:1003-1011.DOI:10.5555/1690219.1690287.
[14] RITTER A,CLARK S,ETZIONI O.Named entity recognition in tweets: An experimental study[C]//Proceedings of the Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing.Edinburgh:Association for Computational Linguistics,2011:1524–1534.
[15] HUANG Zhiheng,XU Wei,YU Kai.Bidirectional LSTM-CRF models for sequence tagging[J/OL].[2015-08-09] .Computer Science(Computation and Language),2015.https://arxiv.org/pdf/1508.01991v1.pdf.
[16] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al.Neural architectures for named entity recognition[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.San Diego: Association for Computational Linguistics,2016:260-270.DOI:10.18653/v1/N16-1030.
[17] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//31st Conference on Neural Information Processing Systems(NIPS 2017).Long Beach:[s.n.],2017:5998-6008.
[18] DEVLIN J,CHANG M-W,LEE K,et al.Bert: Pre-training of deep bidirectional transformers for language understanding[J/OL].(2018-10-11)[2019-05-04] .Computer Science(Computation and Language),2019.https://tooob.com/api/objs/read/noteid/28717995.
[19] CUI Yiming,CHE Wanxiang,LIU Ting,et al.Revisiting pre-trained models for Chinese natural language processing[C]//Findings of the Association for Computational Linguistics.[S.l.]:Association for Computational Linguistics.2020:657-668.DOI:10.18653/v1/2020.findings-emnlp.58
[20] SUN Yu,WANG Shuohuan,LI Yukun,et al.ERNIE 2.0: A continual pre-training framework for language understanding[J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(5):8968-8975.DOI:10.1609/aaai.v34i05.6428.
[21] LI Xiaoya,FENG Jingrong,MENG Yuxian,et al.A unified mrc framework for named entity recognition[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.[S.l.]:Association for Computational Linguistics,2020:5849-5859.DOI:10.18653/v1/2020.acl-main.519.
[22] 高学攀,杜楚,吴金亮.基于BiLSTM-CRF的军事命名实体识别方法[J].无线电工程,2020,50(12):1050-1054.DOI:10.3969/j.issn.1003-3106.2020.12.007.

备注/Memo

备注/Memo:
收稿日期: 2021-03-25
通信作者: 石会鹏(1986-),工程师,博士,主要是从事空间业务频率和轨道资源技术管理与分析的研究.E-mail:shihuipeng@srrc.org.cn.
基金项目: 国家重点研发计划项目(2020YFB1807900, 2020YFB1806103); 国家自然科学基金资助项目(91738101)http://www.hdxb.hqu.edu.cn
更新日期/Last Update: 2021-11-20