[1]林翠萍,吴扬扬.采用改进最长公共子序列的人名消歧[J].华侨大学学报(自然科学版),2016,37(2):201-206.[doi:10.11830/ISSN.1000-5013.2016.02.0201]
 LIN Cuiping,WU Yangyang.Person Name Disambiguation Based on Revised Longest Common Subsequence[J].Journal of Huaqiao University(Natural Science),2016,37(2):201-206.[doi:10.11830/ISSN.1000-5013.2016.02.0201]
点击复制

采用改进最长公共子序列的人名消歧()
分享到:

《华侨大学学报(自然科学版)》[ISSN:1000-5013/CN:35-1079/N]

卷:
第37卷
期数:
2016年第2期
页码:
201-206
栏目:
出版日期:
2016-03-20

文章信息/Info

Title:
Person Name Disambiguation Based on Revised Longest Common Subsequence
文章编号:
1000-5013(2016)02-0201-06
作者:
林翠萍 吴扬扬
华侨大学 计算机科学与技术学院, 福建 厦门 361021
Author(s):
LIN Cuiping WU Yangyang
College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
关键词:
人名消歧 文本相似度 最长公共子序列 层次聚类
Keywords:
person name disambiguation text similarity longest common subsequence hierarc
分类号:
TP391
DOI:
10.11830/ISSN.1000-5013.2016.02.0201
文献标志码:
A
摘要:
将名词、形容词、动名词和命名实体作为文本特征,考虑词序与词频,结合特征项的语义,提出一种基于改进最长公共子序列的文本聚类(LCSC)方法.实验结果表明:相对于传统的余弦值聚类方法,LCSC方法在人名消歧的P-IP指标上,F平均值由74.2%提高到了84.9%;相对于最长公共子序列方法,总体性能也提高了3.7%.
Abstract:
This paper uses nouns, adjectives, gerunds and named entities as text features, and also considers the word order and word frequency when computing the text similarity. A text clustering method based on revised longest common subsequence(LCSC)is proposed. The experimental results show that the LCSC method can significantly improve the overall performance in person name disambiguation compared with traditional clustering method and make the average F-measure increase from 74.2% to 84.9%. The overall performance also improved by 3.7% when compared with the longest common subsequence method.

参考文献/References:

[1] ARTILES J,GONZALO J,VERDEJO F.A testbed for people searching strategies in the WWW[C]//Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Piscataway:ACM,2005:569-570.
[2] BAGGA A,BALDWIN B.Entity-based cross-document coreferencing using the vector space model[C]//Proceedings of the 17th International Conference on Computational Linguistics.Boston:Association for Computational Linguistics,1998:79-85.
[3] MANN G S,YAROWSKY D.Unsupervised personal name disambiguation[C]//Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL.Edmonton:Association for Computational Linguistics,2003:33-40.
[4] PEDERSEN T,PURANDARE A,KULKARNI A.Name discrimination by clustering similar contexts[C]//Computational Linguistics and Intelligent Text Processing.Berlin:Springer Berlin Heidelberg,2005:226-237.
[5] CHEN Y,MARTUB J.Towards robust unsupervised personal name disambiguation[C]//EMNLP-CoNLL.Washington D C:IEEE Press,2007:190-198.
[6] IKEDA M,ONO S,SATO I,et al.Person name disambiguation on the web by two-stage clustering[C]//2nd Web People Search Evaluation Workshop.New York:Association for Computing Machinery,2009:33-38.
[7] YANG Xia, JIN Peng, XIANG Wei.Exploring word similarity to improve Chinese personal name disambiguation[C]//Web Intelligence and Intelligent Agent Technology.Washington D C:IEEE Press,2011:197-200.
[8] SALTON G,WONG A,YANG C S.A vector space model for automatic indexing[J].Communications of the ACM,1975,18(11):613-620.
[9] 董振东,董强.知网简介[EB/OL][2014-03-16] .http://www.keenage.com.
[10] 刘群,李素建.基于《知网》的词汇语义相似度计算[J].中文计算语言学,2002,7(2):59-76.
[11] WAGNER R A,FISCHER M J.The string-to-string correction problem[J].Journal of the ACM(JACM),1974,21(1):168-173.
[12] HIRSCHBERG D S.A linear space algorithm for computing maximal common subsequences[J].Communications of the ACM,1975,18(6):341-343.
[13] 施聪莺,徐朝军,杨晓江.TFIDF 算法研究综述[J].计算机应用,2009,29(B6):167-170.
[14] HIRSCHDERG D S.Algorithms for the longest common subsequence problem[J].Journal of the ACMWeb Intelligence and Intelligent Agent Technology.Washington D C:IEEE Press,1977,24(4):664-675.
[15] 全方磊.数据特征提取在高铁车地传输中的应用研究[D].杭州:浙江大学,2013:39-40.
[16] 牛永洁,张成.多种字符串相似度算法的比较研究[J].计算机与数字工程,2012,40(3):14-17.
[17] 张鑫.人名消歧关键技术研究与实现[D].哈尔滨:哈尔滨工业大学,2012:32-33.

备注/Memo

备注/Memo:
收稿日期: 2014-08-31
通信作者: 吴扬扬(1957-),女,教授, 博士,主要从事数据库技术和数据挖掘的研究.E-mail:wuyy@hqu.edu.cn.
基金项目: 福建省科技计划重大项目(2011H6016); 福建省科技计划重点项目(2011H0028)
更新日期/Last Update: 2016-03-20