[1]陈叶旺,余金山.一种改进的朴素贝叶斯文本分类方法[J].华侨大学学报(自然科学版),2011,32(4):401-404.[doi:10.11830/ISSN.1000-5013.2011.04.0401]
 CHEN Ye-wang,YU Jin-shan.An Improved Text Classification Method Based on Bayes[J].Journal of Huaqiao University(Natural Science),2011,32(4):401-404.[doi:10.11830/ISSN.1000-5013.2011.04.0401]
点击复制

一种改进的朴素贝叶斯文本分类方法()
分享到:

《华侨大学学报(自然科学版)》[ISSN:1000-5013/CN:35-1079/N]

卷:
第32卷
期数:
2011年第4期
页码:
401-404
栏目:
出版日期:
2011-07-20

文章信息/Info

Title:
An Improved Text Classification Method Based on Bayes
文章编号:
1000-5013(2011)04-0401-04
作者:
陈叶旺余金山
华侨大学计算机科学与技术学院
Author(s):
CHEN Ye-wang YU Jin-shan
College of Computer Science and Technology, Huaqiao University, Quanzhou 362021, China
关键词:
文本分类 朴素贝叶斯方法 文档特征 卡方检验
Keywords:
text categorization Nave Bayes text feature Chi-Square test
分类号:
TP391.1
DOI:
10.11830/ISSN.1000-5013.2011.04.0401
文献标志码:
A
摘要:
针对网络中所存在的大量以网页等非结构化形式存在的文本资源,提出一种改进的朴素贝叶斯分类方法.首先,通过卡方检验方法求文档特征并对文档降维,提高特征词区分性信息; 然后,以文本特征来代替原始词条进行朴素贝叶斯对类.实验表明,该方法不仅理论上易于建立和更新,而且分类的精确率也得到提高.
Abstract:
There are huge amount of unstructured text resources in internet,a refined Nave Bayes based text categorization method is proposed in this paper for classifying these resources.Firstly,this method refines text by calculating the features of the text in order to improve the text′s recognizability,and then Nave Bayes is used to classify these resources based on these features instead of the original words.The experiments show that the new method is easy setting up and renew in theory,and the accurate rate of the classification is also improved.

参考文献/References:

[1] 喻小光, 陈维斌, 陈荣鑫. 一种数据规约的近似挖掘方法的实现 [J]. 华侨大学学报(自然科学版), 2008(3):370-374.
[2] SEBASTIANI F. Machine learning in automated text categorization [J]. ACM Computing Surveys, 2002(1):1-47.doi:10.1145/505282.505283.
[3] HAO Li-li, HAO Li-zhu. Automatic identification of stop words in Chinese text classification [A]. Washington, DC:IEEE Computer Society, 2008.718-722.
[4] LEWIS D D, RINGUETTE M. A comparison of two learning algorithms fortext categorization [A]. Las Vegas, Nevada, 1994.81-93.
[5] YANG Yi-ming, LIU Xir. A re-examination of text categorizationmethods [A]. New York:acm Press, 1999.42-49.
[6] 黄萱菁, 吴立德, 石崎洋之. 独立于语种的文本分类方法 [J]. 中文信息学报, 2000(6):1-7.doi:10.3969/j.issn.1003-0077.2000.06.001.
[7] YANG Yi-ming, PEDERSEN J O. A comparative study on feature selection in text categorization [A]. San Francisco:Morgan Kaufmann Publishers, 1997.412-420.

备注/Memo

备注/Memo:
福建省自然科学基金资助项目(A0810013); 华侨大学高层次人才科研启动项目(09BS619)
更新日期/Last Update: 2014-03-23