大数据环境下多决策表的区间值全局近似约简

doi:10.13328/j.cnki.jos.004640

微信服务号

微信订阅号

2025年8月13日 23:25 星期三

首页 > 过刊浏览>2014年第25卷第9期 >2119-2135. DOI:10.13328/j.cnki.jos.004640

PDF HTML阅读 XML下载导出引用引用提醒

大数据环境下多决策表的区间值全局近似约简
DOI:
                        10.13328/j.cnki.jos.004640
                    
CSTR:
                        
                    
作者:
                        徐菲菲徐菲菲
上海电力学院 计算机科学与技术学院, 上海 200090
在期刊界中查找
在百度中查找
在本站中查找
雷景生雷景生
上海电力学院 计算机科学与技术学院, 上海 200090
在期刊界中查找
在百度中查找
在本站中查找
毕忠勤毕忠勤
上海电力学院 计算机科学与技术学院, 上海 200090
在期刊界中查找
在百度中查找
在本站中查找
苗夺谦苗夺谦
同济大学 电子与信息工程学院, 上海 200092
在期刊界中查找
在百度中查找
在本站中查找
杜海舟杜海舟
上海电力学院 计算机科学与技术学院, 上海 200090
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61272437, 60305094); 上海市教育委员会科研创新项目(12YZ140, 14YZ131); 上海市自然科学基金(13ZR1417500)

Approaches to Approximate Reduction with Interval-Valued Multi-Decision Tables in Big Data

Author:

XU Fei-Fei
XU Fei-Fei
College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090, China
在期刊界中查找
在百度中查找
在本站中查找
LEI Jing-Sheng
LEI Jing-Sheng
College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090, China
在期刊界中查找
在百度中查找
在本站中查找
BI Zhong-Qin
BI Zhong-Qin
College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090, China
在期刊界中查找
在百度中查找
在本站中查找
MIAO Duo-Qian
MIAO Duo-Qian
College of Electronic and Information Engineering, Tongji University, Shanghai 200092, China
在期刊界中查找
在百度中查找
在本站中查找
DU Hai-Zhou
DU Hai-Zhou
College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [29]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

在电力大数据中,很多具体的应用如负荷预测、故障诊断都需要依据一段时间内的数据变化来判断所属类别,对某一条数据进行类别判定是毫无意义的.基于此,将区间值粗糙集引入到大数据分类问题中,分别从代数观和信息观提出了基于属性依赖度和基于互信息的区间值启发式约简相关定义和性质证明,并给出相应算法,丰富和发展了区间值粗糙集理论,同时为大数据的分析研究提供了思路.针对大数据的分布式存储架构,又提出了多决策表的区间值全局约简概念和性质证明,进一步给出多决策表的区间值全局约简算法.为了使得算法在实际应用中取得更好的效果,将近似约简概念引入所提的3种算法中,通过对2012上半年某电厂一台600MW的机组运行数据进行稳态判定,验证所提算法的有效性.实验结果表明,所提的3种算法均能在保持较高分类准确率的条件下从对象和属性个数两方面对数据集进行大幅度缩减,从而为大数据的进一步分析处理提供支撑.

关键词:大数据;区间值;近似约简;多决策表;全局约简

Abstract:

For the big data on electric power, many specific applications, such as load forecasting and fault diagnosis, need to consider data changes during a period of time to determine their decision classes, as deriving a class label of only one data record is meaningless. Based on the above discussion, interval-valued rough set is introduced into big data classification. Employing algebra and information theory, this paper defines the related concepts and proves the properties for interval-valued reductions based on dependency and mutual information, and presents the corresponding heuristic reduction algorithms. The proposed methods can not only enrich and develop the interval-valued rough set theory, but also provide a new way for the analysis of big data. Pertaining to the distributed data storage architecture of big data, this paper further proposes the interval-valued global reduction in multi-decision tables with proofs of its properties. The corresponding algorithm is also given. In order for the algorithms to achieve better results in practical applications, approximate reduction is introduced. To evaluate three proposed algorithms, it uses six months’ operating data of one 600MW unit in some power plant. Experimental results show that the three algorithms proposed in this article can maintain high classification accuracy with the proper parameters, and the numbers of objects and attributes can both be greatly reduced.

Key words:big data;interval-value;approximate reduction;multi-decision tables;global reduction

参考文献

[1] Lynch C. Big data: How do your data grow? Nature, 2008,455(7209):28~29. [doi: 10.1038/455028a]

[2] The role of stream computing in big data architectures. 2013. http://ibmdatamag.com/2013/01/the-role-of-stream-computing-in- bigdata-architectures/

[3] Li GJ, Cheng XQ. Research status and scientific thinking of big data. Bulletin of Chinese Academy of Sciences, 2012,27(6): 647~657 (in Chinese with English abstract).

[4] Wang YZ, Jin XL, Cheng XQ. Network big data: Present and future. Chinese Journal of Computers, 2013,36(6):1125~1138 (in Chinese with English abstract). [doi: 10.3724/SP.J.1.16.2013.01125]

[5] Wang S, Wang HJ, Tan XP, Zhou H. Architecting big data: Challenges, studies, forecasts. Chinese Journal of Computers, 2011, 34(10):141~1752 (in Chinese with English abstract). [doi: 10.3274/SP.J.1016.2011.0174]

[6] Li JZ, Liu XM. An important aspect of big data: Data usability. Journal of Computer Research and Development, 2013,50(6): 1147~1162 (in Chinese with English abstract).

[7] Sun DW, Zhang GY, Zheng WM. Big data stream computing: Technologies and instances. Ruan Jian Xue Bao/Journal of Software, 2014 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4558.htm [doi: 10.13328/j.cnki.jos.004558]

[8] Meng XF, Ci X. Big data management: Concepts, techniques and challenges. Journal of Computer Research and Development, 2013,50(1):146~169 (in Chinese with English abstract).

[9] Shen DR, Yu G, Wang XT, Nie TZ, Kou Y. Survey on NoSQL for management of big data. Ruan Jian Xue Bao/Journal of Software, 2013,24(8):1786~1803 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4416.htm [doi: 10.3724/SP.J.1001. 2013.04416]

[10] Rabl T, Sadoghi M, Jacobsen HA. Solving big datachallenges for enterprise application performance management. Proc. of the VLDB Endowment, 2012,5(12):1724~1735. [doi: 10.14778/2367502.2367512]

[11] Mayer V, Cukier K. A Revolution That Will Transform How We Live, Work, and Think. Eamon Dolan/Houghton Mifflin Harcourt, 2013.

[12] Pawlak Z. Rough sets. Int’l Journal of Compute and Information Science, 1982,11(4):341~356. [doi: 10.1007/BF01001956]

[13] Wang GY, Yao YY, Yu H. A survey on rough set theory and applications. Chinese Journal of Computers, 2009,32(7):1229~1246 (in Chinese with English abstract). [doi: 10.3274/SP.J.1016.2009.01229]

[14] Mac Parthalain N, Jensen R, Shen Q. Rough and fuzzy-rough methodsfor mammographic data analysis. Intelligent Data Analysis—An Int’l Journal, 2010,14(2):225~244.

[15] Zhu W. Generalized rough sets based on relations. Information Sciences, 2007,177(22):4997~5011. [doi: 10.1016/j.ins.2007.05. 037]

[16] Zhang WX, Wu WZ, Liang JY, Li DY. Rough Set Theory and Method. Beijing: Science Press, 2001 (in Chinese).

[17] Mi JS, Wu WZ, Zhang WX. Constructive and axiomatic approaches of theory of rough sets. Pattern Recognition and Artificial Intelligence, 2002,15(3):280~284 (in Chinese with English abstract). [doi: 10.3969/j.issn.1003-6059.2002.03.005]

[18] Zhu W. Topological approaches to covering rough sets. Information Sciences, 2007,177(6):1499~1508. [doi: 10.1016/j.ins.2006.06. 009]

[19] Zhang WX, Yao YY, Liang Y. Rough Setand Concept Lattice. Xi’an: Xi’an Jiaotong University Press, 2006 (in Chinese).

[20] Qian YH, Liang JY, Yao YY, Dang CY. MGRS: A multi-granulation rough set. Information Sciences, 2010,180(6):949~970. [doi: 10.1016/j.ins.2009.11.023]

[21] Suyun Z, Tsang E, Degang C. The model of fuzzy variable precision rough sets. IEEE Trans. on Fuzzy Systems, 2009,17(2): 451~467. [doi: 10.1109/TFUZZ.2009.2013204]

[22] Huang B, Hu ZJ, Zhou XZ. Dominance relation-based fuzzy-rough model and its application to audit risk evaluation. Control and Decision, 2009,24(6):899~902 (in Chinese with English abstract).

[23] Zhang DB, Wang 牙捎栬?慈湵摡??攠癈敘氮漠灒浯敵湧瑨???ふ?ち???????????繯??????椠湢??桥楤渠敯獮攠?睵楺瑺桹??湯杵汧楨猠桭?慤扥獬琠牡慮捤琠???戠牡?孰??嵣??桩敯湮?婴???入楸湴??奥???瑡瑳牳楩扦畩瑣敡?物敯摮甮挠瑎楥潵湲?潣景?楰湵瑴敩牮癧愬氠?瘰愰氹甬攷搲?椱渰昭漱爲洩愺琲椴漳渳?猲礴猴琳攮洠?扤慯獩攺搠?漰渮?琰栱收?浪愮确楥浵慣汯?琮漲氰攰爸愮渱挲攮‰挰氳慝猼獢?￣?甲稴穝礠?卵礠獆瑆攬洠獍?慡湯搠??愬琠桗浥慩琠楌挮猠???ぺべ???????????繩?????楲湥??档楴湩敯獮攠?睩楡琠桭??湵条汬楩獮桦?慲扭獡瑴物慯据琠???扨爠?孮??嵰??畩潣?兴???椠畴?圠????楥慲漠?塬???坩畦??????湮漮瘠敃汯?楰湵瑴敥牲癳愠氦?癍慡汴畨敥摭?慴瑩瑣牳椠扷畩瑴楨漠湁?牰敬摩畣捡瑴楩潯湮?愬氠朲漰爰椹琬栵洷?戶愩猺攱搰?漰湾?昰由稷種礠?捤汯畩猺琠攱爰???由稶稯祪?卣祡獭瑷敡洮猲‰愰游搮??愮琰栲洷慝琼楢捲猾?′特そㄠ??㈠????????縠?????椠湄??栠楅湦敦獩散?睥楮瑴栠??湭杢汯楬獩档?慡扮獤琠牮慵捭瑥???扡牬?孡??嵲??潵湴来?坲????楴??奮??坩慴湨朠?卯????桳敥湴杳??呐???瑥瑲牮椠扒略瑣敯?牮敩摴畩捯瑮椠潡湮?漠晁?楴湩瑦敩牣癩慡汬?癉慮汴略敬摬?楧湥普潣牥洬愠琲椰漰游?猶示猷琳攲浾?戳愸猠攨摩?漠湃?晩畮穥穳祥?摷楩獴捨攠牅湮楧扬楩汳楨琠祡?浳慴瑲牡楣硴???潢畲爾湛愲氶?漠晌?卡桮慧渠硊楙?唠湑楩癡敮爠獙楈琬礠??乤慲瑹畣牺愠汗?匠捄楡敮湧挠敃???休のㄠ???????????繣?????楡湴??栠楦湯敲猠敡?睴楲瑩桢??湥朠汲楥獤桵?慴扩獯瑮爠慦捲瑯??incomplete data in rough set framework. Pattern Recognition, 2011,44(8):1658~1670. [doi: 10.1016/j.patcog.2011.02.020]

[27] Wang WH, Zhou DH. An algorithm for knowledge reduction in rough sets based on genetic algorithm. Journal of System Simulation, 2001,13:91~94 (in Chinese with English abstract).

[28] Qian J, Miao DQ, Zhang ZH, Zhang ZF. Parallel algorithm model for knowledge reduction using MapReduce. Journal of Frontiers of Computer Science and Technology, 2013,7(1):35~45 (in Chinese with English abstract). [doi: 10.3778/j.issn.1673-9418. 1206048]

[29] Zhang JB, Li TR, Pan Y. PLAR: Parallel Large-Scale Attribute Reduction on Cloud Systems. Institute of Electrical & Electronic Engineers, 2013.

[30] Yang M, Yang P. Approximate reduction based on conditional information entropyover vertically partitioned multi-decision table. Control and Decision, 2008,23(10):1103~1108 (in Chinese with English abstract).

[31] Ye MQ, Hu XG, Wu CR. Privacy preserving attribute reduction based on conditional information entropy over vertically partitioned multi-decision tables. Journal of Shandong University (Natural Science), 2010,45(9):14~26 (in Chinese with English abstract).

[32] Zhang N, Miao DQ, Yue XD. Approaches to knowledge reduction in interval-valued information systems. Journal of Computer Resea????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

引用本文

徐菲菲,雷景生,毕忠勤,苗夺谦,杜海舟.大数据环境下多决策表的区间值全局近似约简.软件学报,2014,25(9):2119-2135

复制

文章指标

点击次数:6048
下载次数: 7757
HTML阅读次数: 2703
引用次数: 0

历史

收稿日期:2014-03-31
最后修改日期:2014-05-14
录用日期:
在线发布日期: 2014-09-09
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码