[1]于丽,亚森·艾则孜.采用相关反馈和文档相似度的维吾尔语检索词加权方法[J].华侨大学学报(自然科学版),2017,38(3):408-413.[doi:10.11830/ISSN.1000-5013.201703022]
 YU Li,YASEN·AIZEZI.Uyghur Retrieval Word Weighting Scheme Using Relevance Feedback and Document Similarity[J].Journal of Huaqiao University(Natural Science),2017,38(3):408-413.[doi:10.11830/ISSN.1000-5013.201703022]
点击复制

采用相关反馈和文档相似度的维吾尔语检索词加权方法()
分享到:

《华侨大学学报(自然科学版)》[ISSN:1000-5013/CN:35-1079/N]

卷:
第38卷
期数:
2017年第3期
页码:
408-413
栏目:
出版日期:
2017-05-20

文章信息/Info

Title:
Uyghur Retrieval Word Weighting Scheme Using Relevance Feedback and Document Similarity
文章编号:
1000-5013(2017)03-0408-06
作者:
于丽 亚森·艾则孜
新疆警察学院 信息安全工程系, 新疆 乌鲁木齐 830011
Author(s):
YU Li YASEN·AIZEZI
Department of Information Security Engineering, Xinjiang Police College, Urumqi 830011, China
关键词:
维吾尔语 文档检索 检索词加权 相关反馈 文档相似度
Keywords:
Uygur document retrieval weighted retrieval words relevance feedback document similarity
分类号:
TP391
DOI:
10.11830/ISSN.1000-5013.201703022
文献标志码:
A
摘要:
针对维吾尔语Web文档的有效检索问题,提出一种基于相关反馈和文档相似度的检索词加权方法.首先,对维吾尔语文档进行预处理,获得相应的词干集.然后,当用户输入多个检索词时,执行初始检索,并基于局部相关反馈思想提取出排名靠前的N个文档.接着,利用TF-IDF算法计算检索词与反馈文档之间的词频相似度,通过余弦距离计算文档之间的相似度,并以此对检索词进行两次加权.最后,根据加权后的检索词进行文档检索.实验结果表明:该方法能够准确地检索出用户所需的文档,并将其靠前排序.
Abstract:
For the issue that the effective retrieval of Uyghur web documents, a Uyghur retrieval word weighting scheme based on the relevance feedback and document similarity is proposed. First of all, the Uyghur documents are pre-processed to obtain the corresponding stem set. Then, the initial search is executed when the user input a number of retrieval words, and it extracts the top N documents based on local relevance feedback. Follow, the TF-IDF algorithm is used to compute the frequency similarity between retrieval word and feedback documents. At the same time, the cosine distance is used to compute the similarity between documents, so as to make twice weighted for retrieval words. Finally, it performs document retrieval according to the weight of retrieval words. Experimental results show that the proposed method can accurately retrieve the documents required by the user, and can sort them in the front.

参考文献/References:

[1] 阿丽亚·艾尔肯,哈力旦·阿布都热依木.KNN和SVM分类器对维吾尔文文本分类性能的比较研究[J].新疆大学学报(自然科学维文版),2015,32(2):59-65.
[2] 亚力青·阿里玛斯,哈力旦·阿布都热依木,陈洋.基于向量空间模型的维吾尔文文本过滤方法[J].新疆大学学报(自然科学版),2015,32(2):221-226.
[3] HAN Tiantan,WANG Wendong,GONG Xiangyang,et al.Personal multimedia data retrieval query expansion and similarity algorithm improvement based wordNet[J].International Proceedings of Computer Science and Information Tech,2012,42(3):51-62.
[4] 陈雅兰,胡小华,涂新辉,等.基于位置语言模型的中文信息检索系统的研究[J].计算机科学,2015,42(7):265-269.
[5] HAHM G J,YI M Y,LEE J H,et al.A personalized query expansion approach for engineering document retrieval[J].Advanced Engineering Informatics,2014,28(4):344-359.
[6] ATSUSHI F.Enhancing web document retrieval by the anchor text model and query classification[J].Ipsj Journal,2010,51(3):2330-2342.
[7] 李卫疆,赵铁军,王宪刚.基于上下文的查询扩展[J].计算机研究与发展,2010,47(2):300-304.
[8] 陈志敏,姜艺,赵耀.基于用户查询扩展的自动摘要技术[J].计算机应用研究,2011,28(6):2188-2190.
[9] DANG E K F,LUK R W P,ALLAN J.Fast forward index methods for pseudo-relevance feedback retrieval[J].Acm Transactions on Information Systems,2015,33(4):1-33.
[10] SEMBOK T M T,BAKAR Z A.Characteristics and retrieval effectiveness of n-gram string similarity matching on Malay documents[C]//Proceedings of the 10th WSEAS International Conference on Applied Computer and Applied Computational Science.Stevens Point:ACM Press,2011:165-170.
[11] ZHOU Yun,CROFT W B.Weighted information gain and user clicks on web search results[C]//International ACM SIGIR Conference on Research and Development in Information Retrieval.Amsterdam:ACM Press,2012: 543-550.
[12] 年梅,张兰芳.维吾尔文网络查询扩展词的构建研究[J].计算机工程,2015,41(4):187-189.
[13] 麦热哈巴·艾力,姜文斌,王志洋,等.维吾尔语词法分析的有向图模型[J].软件学报,2012,23(12):94-100.
[14] LEGOWO N,ROJALI S.Design of thesis topic search engine with information retrieval and vector space model of TF-IDF weighting[J].Australian Journal of Basic and Applied Sciences,2013,42(4):264-273.
[15] 彭凯,汪伟,杨煜普.基于余弦距离度量学习的伪K近邻文本分类算法[J].计算机工程与设计,2013,34(6):2200-2203.

备注/Memo

备注/Memo:
收稿日期: 2016-05-10
通信作者: 亚森·艾则孜(1975-),男,教授,主要从事信息安全、自然语言处理的研究.E-mail:yulixjpc@126.com.
基金项目: 新疆维吾尔自治区自然科学基金资助项目(2015211A016)
更新日期/Last Update: 2017-05-20