面向多维稀疏数据仓库的欺诈销售行为挖掘

doi:10.13328/j.cnki.jos.005905

微信服务号

微信订阅号

2025年8月11日 5:13 星期一

首页 > 过刊浏览>2020年第31卷第3期 >710-725. DOI:10.13328/j.cnki.jos.005905

PDF HTML阅读 XML下载导出引用引用提醒

面向多维稀疏数据仓库的欺诈销售行为挖掘
DOI:
                        10.13328/j.cnki.jos.005905
                    
CSTR:
                        
                    
作者:
                        郑皎凌郑皎凌
软件自动生成与智能服务四川省重点实验室(成都信息工程大学), 四川 成都 610225;成都信息工程大学 软件工程学院, 四川 成都 610225
在期刊界中查找
在百度中查找
在本站中查找
乔少杰乔少杰
软件自动生成与智能服务四川省重点实验室(成都信息工程大学), 四川 成都 610225;成都信息工程大学 软件工程学院, 四川 成都 610225
在期刊界中查找
在百度中查找
在本站中查找
舒红平舒红平
软件自动生成与智能服务四川省重点实验室(成都信息工程大学), 四川 成都 610225;成都信息工程大学 软件工程学院, 四川 成都 610225
在期刊界中查找
在百度中查找
在本站中查找
应广华应广华
阿里巴巴技术有限公司, 浙江 杭州 311121
在期刊界中查找
在百度中查找
在本站中查找
Louis Alberto GUTIERREZLouis Alberto GUTIERREZ
Department of Computer Science, Rensselaer Polytechnic Institute, New York, USA
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:郑皎凌(1981-),女,重庆人,博士,副教授,CCF专业会员,主要研究领域为人工智能,数据库,知识工程;应广华(1988-),男,硕士,主要研究领域为人工智能,在线金融风险控制;乔少杰(1981-),男,博士后,教授,CCF高级会员,主要研究领域为移动数据库,数据挖掘;Louis Alberto GUTIERREZ (1980-),男,博士,Researcher,主要研究领域为数据挖掘;舒红平(1974-),男,博士,教授,博士生导师,主要研究领域为数据库,知识工程.
通讯作者:乔少杰,E-mail:sjqiao@cuit.edu.cn
中图分类号:
基金项目:国家自然科学基金（61772091，61802035，61962006）；四川省科技计划（20YYJC2785，2018JY0448，2019YFG0106，2019YFS0067）；四川高校科研创新团队建设计划（18TD0027）；广西自然科学基金（2018GXNSFDA138005）；成都信息工程大学科研基金（KYTZ201715，KYTZ201750）；成都信息工程大学中青年学术带头人科研基金（J201701）；广东省普及型高性能计算机重点实验室项目（2017B030314073）

Sale Fraud Behavior Detection over Multidimensional Sparse Data Warehouse

Author:

ZHENG Jiao-Ling
ZHENG Jiao-Ling
Sichuan Key Laboratory of Software Automatic Generation and Intelligent Service(Chengdu University of Information Technology), Chengdu 610225, China;School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
在期刊界中查找
在百度中查找
在本站中查找
QIAO Shao-Jie
QIAO Shao-Jie
Sichuan Key Laboratory of Software Automatic Generation and Intelligent Service(Chengdu University of Information Technology), Chengdu 610225, China;School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
在期刊界中查找
在百度中查找
在本站中查找
SHU Hong-Ping
SHU Hong-Ping
Sichuan Key Laboratory of Software Automatic Generation and Intelligent Service(Chengdu University of Information Technology), Chengdu 610225, China;School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
在期刊界中查找
在百度中查找
在本站中查找
YING Guang-Hua
YING Guang-Hua
Alibaba(China) Technology Co. Ltd., Hangzhou 311121, China
在期刊界中查找
在百度中查找
在本站中查找
Louis Alberto GUTIERREZ
Louis Alberto GUTIERREZ
Department of Computer Science, Rensselaer Polytechnic Institute, New York, USA
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Natural Science Foundation of China (61772091, 61802035, 61962006); Sichuan Science and Technology Program (20YYJC2785, 2018JY0448, 2019YFG0106, 2019YFS0067); Innovative Research Team Construction Plan in Universities of Sichuan Province (18TD0027); National Natural Science Foundation of Guangxi of China (2018GXNSFDA138005); Scientific Research Foundation for Advanced Talents of Chengdu University of Information Technology (KYTZ201715, KYTZ201750); Scientific Research Foundation for Young Academic Leaders of Chengdu University of Information Technology (J201701); Guangdong Province Key Laboratory of Popular High Performance Computers (2017B030314073)

摘要

图/表

访问统计

参考文献 [24]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

分销渠道系统中，产品制造商会分配给销售额较大的分销商更多返点利润鼓励销售，而分销商之间可能会联合起来将多个分销商的销售业绩累计在其中一个分销商上，获取高额利润，这种商业欺诈行为被称为挂单或窜货.由于数据中大量正常极值点的存在，使得传统异常探测算法很难区分正常极值和由挂单导致的异常极值；另外，多维销售数据本身就存在的稀疏性导致多维数据异常探测算法无法有效运行.为了克服上述问题，将人工智能和数据库技术结合起来，提出了基于分割率的特征提取方法和基于张量重构的挂单行为挖掘算法.同时，由于分销商之间存在多种挂单行为，设计了基于挂单模式偏序格的特征提取方法来对销售数据集中存在的挂单行为进行分类.在合成数据的实验中，所提出的挂单点挖掘算法能达到65%的平均AUC值，而传统特征提取方法仅达到36%和30%的平均AUC值.在真实数据上的实验结果表明，挂单行为探测方法能区分正常销售极值和挂单行为产生的异常极值.

关键词:分析渠道欺诈;人工智能;挂单模式;张量;偏序格

Abstract:

In distribution channel system, product manufacturer will often reward retail trader who makes big deal to increase the sales. On the other hand, in order to obtain high reward, retail traders may form alliance, where a cheating retail trader accumulates the deals of other retail traders. This type of commercial fraud is called deal cheating or cross region sale. Because the sales contain a lot of normal big deals, traditional outlier detection methods cannot distinguish the normal extreme value and the true outlier generated by deal cheating behavior. Meanwhile, the sparsity of the multidimensional sales data makes the outlier detection methods based on multidimensional space cannot work effectively. To handle the aforementioned problems, this study proposes deal cheating mining algorithms based on ratio characteristic and tensor reconstruction method. These algorithms combine artificial intelligence and database technique. Meanwhile, because there are multiple types of deal cheating patterns, this study proposes deal cheating pattern classification methods based on the partially ordered lattice of deal cheating patterns. In the experiments on synthetic data, the deal cheating detection algorithm based on the ratio characteristic can achieve an average AUC-value of 65%. The traditional feature extraction methods can only achieve average AUC-values of 36% and 30%. In the experiments on the real data, the results shows the deal cheating detection algorithm is capable of distinguishing normal big deal from abnormal big deal which may be generated by the deal cheating behaviors.

Key words:distribution channel fraud;artificial intelligence;deal cheating pattern;tensor;partially ordered lattice

参考文献

[1] Kenneth G, Magrath AJ. Dealing with cheating in distribution. European Journal of Marketing, 1989,23(2):123-129.[doi:10.1108/eum0000000000551]

[2] Shu K, Luo P, Li W, Yin F, Tang L. Deal or deceit:Detecting cheating in distribution channels. In:Proc. of the 23rd ACM CIKM Int'l Conf. on Information and Knowledge Management. San Francisco:ACM, 2013. 1419-1428.[doi:10.1145/2661829.2661874]

[3] Zhang R, Zheng F. Sequential behavioral data processing using deep learning and the Markov transition field in online fraud detection. In:Proc. of the 24th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. London:ACM, 2018. 1-5.[doi:10.1093/obo/9780199828340-0063]

[4] De Roux D, Perez B, Moreno A, Villamil MDP Figueroa C. Tax fraud detection for under-reporting declarations using an unsupervised machine learning approach. In:Proc. of the 24th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. London:ACM, 2018. 215-222.

[5] Jiang ST, Min W, Gao Q. Question and answer feature extracting framework for online lending collection risk modeling with xencoder. In:Proc. of the 11th ACM WSDM Int'l Conf. on Web Search and Data Mining. Los Angeles:ACM, 2018. 1211-1215.

[6] Min W, Tang ZY, Zhu M, Dai YX, Wei Y, Zhang RN. Behavior language processing with graph based feature generation for fraud detection in online lending. In:Proc. of the 11th ACM WSDM Int'l Conf. on Web Search and Data Mining. Los Angeles:ACM, 2018. 1430-1436.

[7] Vlasselaer V, Eliassi-Rad T, Akoglu L, Snoeck M, Baesens B, Afraid:Fraud detection via active inference in time-evolving social networks. In:Proc. of the 11th ACM ASONAM Int'l Conf. on Advances in Social Networks Analysis and Mining. Paris:ACM, 2015. 659-666.[doi:10.1145/2808797.2810058]

[8] Vlasselaer V, Akoglu L, Eliassi-Rad T, Snoeck M. Guilt-by-constellation:Fraud detection by suspicious clique memberships. In:Proc. of the 48th IEEE HICSS Hawaii Int'l Conf. on System Sciences. Hawaii:IEEE, 2015. 918-927.[doi:10.1109/hicss.2015. 114]

[9] Zhu H, Xiong H, Ge Y, Chen E. Discovery of ranking fraud for mobile apps. IEEE Trans. on Knowledge and Data Engineering, 2015,27(1):74-87.[doi:10.1109/TKDE.2014.2320733]

[10] Heindorf S, Potthast M, Stein B, Engels G. Vandalism detection in wikidata. In:Proc. of the 25th ACM CIKM Int'l on Conf. on Information and Knowledge Management. Indiana:ACM, 2016. 327-336.[doi:10.1145/2983323.2983740]

[11] Kumar S, Spezzano F, Subrahmanian V. VEWS:A Wikipedia vandal early warning system. In:Proc. of the 21st ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Sydney:ACM, 2015. 607-616.[doi:10.1145/2783258.2783367]

[12] Li X, Han J. Mining approximate top-k subspace anomalies in multi-dimensional time-series data. In:Proc. of the 33rd ACM VLDB Int'l Conf. on Very Large Data Bases. Vienna:ACM, 2007. 447-458.[doi:10.1023/A:1015417610840]

[13] Henrion M, Hand D, Gandy A, Mortlock D. Casos:A subspace method for anomaly detection in high dimensional astronomical databases. Statistical Analysis and Data Mining the Asa Data Science Journal, 2013,6(1):53-72.[doi:10.1002/sam.11167]

[14] Heine F. Outlier detection in data streams using olap cubes. In:Proc. of the Communications in Computer and Information Science. 2017. 29-36.[doi:10.1007/978-3-319-67162-8_4]

[15] Dalmia A, Gupta M, Varma V. Query-based graph cuboid outlier detection. In:Proc. of the 11th ACM ASONAM Int'l Conf. on Advances in Social Networks Analysis and Mining. Paris:ACM, 2015. 101-113.[doi:10.1145/2808797.2810061]

[16] Kriegel H, Kroger P, Schubert E, Zimek A. Outlier detection in axis-parallel subspaces of high dimensional data. In:Proc. of the Advances in Knowledge Discovery and Data Mining. 2009. 831-838.[doi:10.1007/978-3-642-01307-2_86]

[17] He Z, Xu X, Huang Z, Deng S. FP-outlier frequent pattern based outlier detection. Computer Science and Information Systems, 2005,2(1):103-118.[doi:10.2298/csis0501103h]

[18] Vreeken J, Leeuwen M, Siebes A. Krimp:Mining itemsets that compress. Data Mining and Knowledge Discovery, 2011,2(1):169-214.[doi:10.1007/s10618-010-0202-x]

[19] Muller E, Schiffer M, Seidl T. Statistical selection of relevant subspace projections for outlier ranking. In:Proc. of the 27th IEEE Int'l Conf. on Data Engineering. Washington:IEEE, 2011. 434-445.[doi:10.1109/icde.2011.5767916]

[20] Jiang M, Cui P, Beutel A, Faloutsos C, Yang S. Inferring strange behavior from connectivity pattern in social networks. In:Proc. of the Advances in Knowledge Discovery and Data Mining. 2014. 126-138.[doi:10.1007/978-3-319-06608-0_11]

[21] Jiang M, Beutel A, Cui P, Hooi B, Yang S, Faloutsos C. Spotting suspicious behaviors in multimodal data:A general metric and algorithms. IEEE Trans. on Knowledge and Data Engineering, 2016,28(8):2187-2200.[doi:10.1109/tkde.2016.2555310]

[22] Eswaran D, Gnnemann S, Faloutsos C, Makhija D, Kumar M. Zoobp:Belief propagation for heterogeneous networks. Proc. of the VLDB Endowment, 2017,10(5):625-636.[doi:10.14778/3055540.3055554]

[23] Costa A, Yamaguchi Y, Traina A, Faloutsos C. RSC:Mining and modeling temporal activity in social media. In:Proc. of the 21st ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Sydney:ACM, 2015. 269-278.[doi:10.1145/2783258. 2783294]

[24] Breiman L. Random forests. Machine Learning, 2001,45:5-32.[doi:10.1007/0-387-21529-8_16]

引用本文

郑皎凌,乔少杰,舒红平,应广华,Louis Alberto GUTIERREZ.面向多维稀疏数据仓库的欺诈销售行为挖掘.软件学报,2020,31(3):710-725

复制

文章指标

点击次数:2748
下载次数: 5946
HTML阅读次数: 3347
引用次数: 0

历史

收稿日期:2019-07-20
最后修改日期:2019-09-10
录用日期:
在线发布日期: 2020-01-10
出版日期: 2020-03-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码