[1]喻小光,陈维斌,陈荣鑫.一种数据规约的近似挖掘方法的实现[J].华侨大学学报(自然科学版),2008,29(3):370-374.[doi:10.11830/ISSN.1000-5013.2008.03.0370]
 YU Xiao-guang,CHEN Wei-bin,CHEN Rong-xin.Research and Realization of Approximate Data Mining Based on Data Reduction[J].Journal of Huaqiao University(Natural Science),2008,29(3):370-374.[doi:10.11830/ISSN.1000-5013.2008.03.0370]
点击复制

一种数据规约的近似挖掘方法的实现()
分享到:

《华侨大学学报(自然科学版)》[ISSN:1000-5013/CN:35-1079/N]

卷:
第29卷
期数:
2008年第3期
页码:
370-374
栏目:
出版日期:
2008-07-20

文章信息/Info

Title:
Research and Realization of Approximate Data Mining Based on Data Reduction
文章编号:
1000-5013(2008)03-0370-05
作者:
喻小光陈维斌陈荣鑫
华侨大学信息科学与工程学院; 华侨大学信息科学与工程学院 福建泉州362021; 福建泉州362021
Author(s):
YU Xiao-guang CHEN Wei-bin CHEN Rong-xin
College of Information Science and Engineering, Huaqiao University, Quanzhou 362021, China
关键词:
近似挖掘 数据规约 属性选择 实例选择
Keywords:
approximately mining data reduction attribute selection instances selection
分类号:
TP311.13
DOI:
10.11830/ISSN.1000-5013.2008.03.0370
文献标志码:
A
摘要:
讨论基于数据规约的近似挖掘技术,在数据预处理阶段对海量数据集进行数据规约.近似数据挖掘的工作流程包括任务定义、数据准备与预处理、数据挖掘建模、结果的解释与评估、模型发布与应用5个阶段.同时,提出使用属性选择和实例选择方法实现近似挖掘的方案,并对该方案进行挖掘效率和结果模型准确性的分析评估.该方案能满足对企业级大数据集进行高效挖掘的需要.
Abstract:
Data-reduction-based approximate data mining technique in which data reduction for massive data set was done in data pretreatment phase has been discussed.Approximate data mining work flow includes 5 phases,such as task definition,data preparing and pretreatment,data mining modeling,results explaining and evaluating and model publication.At the same time,the solution using attribute selection and instance selection to realize the approximation mining is brought out,and the mining efficiency and result model veracity are analyzed and evaluated.The solution can satisfy the need of mining on enterprise level massive data set.

参考文献/References:

[1] BIUM L, LANGLEY P. Selection of relevant features and examples in machine learning [J]. Artificial Intelligence, 1997, (1-2):245-271.
[2] 邵峰晶, 于忠清. 数据挖掘原理与算法 [M]. 北京:中国水利水电出版社, 2003.
[3] DUNHAM H. Data mining course [M]. 北京:清华大学出版社, 2003.
[4] HALL M A, HOLMES G. Bench marking attributes selection techniques for discrete class data mining [J]. IEEE Transactions on Knowledge and Data Engineering, 2003(3):1-16.
[5] KOHAVI R, JOHN H. Wrappers for Feature Subset Selection [J]. Artificial Intelligence, 1997, (1-2):273-324.
[6] HALL A. Correlation-based feature selection for machine learning [D]. Hamilton:University of Waikato, 1998.
[7] DASH M, LIU H, MOTODA H. Consistency based feature selection [A]. Berlin:springer-verlag, 2000.98-109.
[8] HAND D, MANNILA H, SMYTH P. Data mining principle [M]. 北京:机械工业出版社, 2003.
[9] GREINER R. Probabilistic hill-climbing:Theory and applications [A]. 1992.
[10] University of California. UCI machine-learning-databases [DB/OL]. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/adult, 2006.

备注/Memo

备注/Memo:
福建省青年科技人才创新基金项目(2002J011); 华侨大学科研基金资助项目(04HZR17)
更新日期/Last Update: 2014-03-23