«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

ISSN.1000-5013.201603067]
点击复制

改进的频繁和高效用项集挖掘算法()

分享到：

《华侨大学学报（自然科学版）》[ISSN:1000-5013/CN:35-1079/N]

卷:: 第38卷
期数:: 2017年第6期

页码:: 880-885

栏目:

出版日期:: 2017-11-20

文章信息/Info

Title:: Improved Mining Algorithm for Frequent and High Utility Itemsets

文章编号:: 1000-5013(2017)06-0880-06

作者:: 张健; 刘韶涛; 华侨大学计算机科学与技术学院, 福建厦门 361021

Author(s):: ZHANG Jian; LIU Shaotao; College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China

关键词:: 频繁项集; 高效用项集; 伪投影; 事务合并

Keywords:: frequent itemsets; high utility itemsets; pseudo projection; transaction merging

分类号:: TP311

DOI:: 10.11830/ISSN.1000-5013.201603067

文献标志码:: A

摘要:: 提出一种基于局部效用质量值的上界剪枝新方法,引入伪投影技术避免真实地构造物理投影,基于二者提出改进的FHIMA-P算法.在提出的FHIMA-P算法中引入事务合并和投影事务合并技术,提出最终的FHIMA-MP算法,并在mushroom和accident数据集上进行实验.结果表明:FHIMA-P算法的运行时间相比FHIMA-ALL算法缩短,而FHIMA-MP算法则较前两者效率有非常大的提高;在不同参数下,mushroom和accident数据集中大量可合并事务(投影事务)数目也很好地证明了事务(投影事务)合并的有效性.

Abstract:: A new method that uses the upper bound of quality to prune the search space based on local utility quality is proposed, meanwhile, pseudo projection technique is introduced to avoid actually construct the physical projection, then based on these two points, an improved FHIMA-P algorithm is proposed. By adding the transaction merging and projected transaction merging technique in FHIMA-P algorithm, the final FHIMA-MP algorithm is proposed. An experiment is conducted on mushroom and accident dataset, the result shows that the running time of FHIMA-P algorithm is shorter than that of FHIMA-ALL algorithm, while the FHIMA-MP algorithm improves significantly compared with the previous two algorithms’ efficiency. Moreover, the huge number of transactions(projected transaction)that can be merged on mushroom and accident dataset in different papameters also prove the effectiveness of transaction(projected transaction)merging technique.

参考文献/References:

[1] AGRAWAL B R,SRIKANT R.Fast algorithm for mining association rules[C]//Proc of International Conference on Very Large Data Bases.Santiago:VLDB,1994:487-499.DOI:10.1109/tencon.2003.1273266.
[2] HAN Jiawei,PEI Jian,YIN Yiwen.Mining frequent patterns without candidate generation[C]//Proc of 2000 ACM-SIGMOD International Conference on Management of Data(SIGMOD’00).Dallas:Conference Publications,2000:1-12.DOI:10.1023/b:dami.0000005258.31418.83.
[3] DENG Zhihong,WANG Zhonghui,JIANG Jiajian.A new algorithm for fast mining frequent itemsets using N-lists[J].Sciece China Information Sciences,2012,55(9):2008-2030.DOI:10.1007/s11432-012-4638-z.
[4] DENG Zhihong,LYU Shenglong.Fast mining frequent itemsets using Nodesets[J].Expert Systems with Applications,2014,41(10):4505-4512.DOI:10.1016/j.eswa.2014.01.025.
[5] YAO Hong,HAMILTON H J,BUTZ C J.A foundational approach to mining itemset utilities from databases[C]//Proceedings of the Fourth SIAM International Conference on Data Mining.Florida:DBLP,2004:482-486.
[6] LIU Ying,LIAO Weikeng,CHOUDHARY A.A two-phase algorithm for fast discovery of high utility itemsets[C]//PAKDD’05 Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining.Hanoi:Springer Berlin Heidelberg,2005:689-695.DOI:10.1007/11430919_79.
[7] TSENG V S,WU C W,SHIE B E,et al.UP-Growth: An efficient algorithm for high utility itemset mining[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington D C:ACM Press,2010:253-262.DOI:10.1145/1835804.1835839.
[8] LIU Mengchi,QU Junfeng.Mining high utility itemsets without candidate generation[C]//ACM International Conference on Information and Knowledge Management.New York:ACM Press,2012:55-64.
[9] KRISHNAMOORTHY S.Pruning strategies for mining high utility itemsets[J].Expert Systems with Applications,2015,42(5):2371-2381.DOI:10.1016/j.eswa.2014.11.001.
[10] 李慧,刘贵全,瞿春燕.频繁和高效用项集挖掘[J].计算机科学,2015,42(5):82-87.DOI:10.11896/j.issn.1002-137X.2015.05.017.
[11] PEI Jian,HAN Jiawei,MORTAZAVI-ASL B,et al.PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth[C]//Proceedings of the 17th International Conference on Data Engineering.Washington D C:IEEE Press,2001:215-224.DOI:10.1109/icde.2001.914830.

备注/Memo

备注/Memo:: 收稿日期: 2016-03-24
通信作者: 刘韶涛(1969-),男,副教授,主要从事软件体系结构与软件复用的研究.E-mail:shaotaol@hqu.edu.cn.
基金项目: 福建省科技计划重大项目(2011H6016)

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1552
全文下载/Downloads889
评论/Comments

更新日期/Last Update: 2017-11-20