Sale Fraud Behavior Detection over Multidimensional Sparse Data Warehouse
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61772091, 61802035, 61962006); Sichuan Science and Technology Program (20YYJC2785, 2018JY0448, 2019YFG0106, 2019YFS0067); Innovative Research Team Construction Plan in Universities of Sichuan Province (18TD0027); National Natural Science Foundation of Guangxi of China (2018GXNSFDA138005); Scientific Research Foundation for Advanced Talents of Chengdu University of Information Technology (KYTZ201715, KYTZ201750); Scientific Research Foundation for Young Academic Leaders of Chengdu University of Information Technology (J201701); Guangdong Province Key Laboratory of Popular High Performance Computers (2017B030314073)

  • Article
  • | |
  • Metrics
  • |
  • Reference [24]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    In distribution channel system, product manufacturer will often reward retail trader who makes big deal to increase the sales. On the other hand, in order to obtain high reward, retail traders may form alliance, where a cheating retail trader accumulates the deals of other retail traders. This type of commercial fraud is called deal cheating or cross region sale. Because the sales contain a lot of normal big deals, traditional outlier detection methods cannot distinguish the normal extreme value and the true outlier generated by deal cheating behavior. Meanwhile, the sparsity of the multidimensional sales data makes the outlier detection methods based on multidimensional space cannot work effectively. To handle the aforementioned problems, this study proposes deal cheating mining algorithms based on ratio characteristic and tensor reconstruction method. These algorithms combine artificial intelligence and database technique. Meanwhile, because there are multiple types of deal cheating patterns, this study proposes deal cheating pattern classification methods based on the partially ordered lattice of deal cheating patterns. In the experiments on synthetic data, the deal cheating detection algorithm based on the ratio characteristic can achieve an average AUC-value of 65%. The traditional feature extraction methods can only achieve average AUC-values of 36% and 30%. In the experiments on the real data, the results shows the deal cheating detection algorithm is capable of distinguishing normal big deal from abnormal big deal which may be generated by the deal cheating behaviors.

    Reference
    [1] Kenneth G, Magrath AJ. Dealing with cheating in distribution. European Journal of Marketing, 1989,23(2):123-129.[doi:10.1108/eum0000000000551]
    [2] Shu K, Luo P, Li W, Yin F, Tang L. Deal or deceit:Detecting cheating in distribution channels. In:Proc. of the 23rd ACM CIKM Int'l Conf. on Information and Knowledge Management. San Francisco:ACM, 2013. 1419-1428.[doi:10.1145/2661829.2661874]
    [3] Zhang R, Zheng F. Sequential behavioral data processing using deep learning and the Markov transition field in online fraud detection. In:Proc. of the 24th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. London:ACM, 2018. 1-5.[doi:10.1093/obo/9780199828340-0063]
    [4] De Roux D, Perez B, Moreno A, Villamil MDP Figueroa C. Tax fraud detection for under-reporting declarations using an unsupervised machine learning approach. In:Proc. of the 24th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. London:ACM, 2018. 215-222.
    [5] Jiang ST, Min W, Gao Q. Question and answer feature extracting framework for online lending collection risk modeling with xencoder. In:Proc. of the 11th ACM WSDM Int'l Conf. on Web Search and Data Mining. Los Angeles:ACM, 2018. 1211-1215.
    [6] Min W, Tang ZY, Zhu M, Dai YX, Wei Y, Zhang RN. Behavior language processing with graph based feature generation for fraud detection in online lending. In:Proc. of the 11th ACM WSDM Int'l Conf. on Web Search and Data Mining. Los Angeles:ACM, 2018. 1430-1436.
    [7] Vlasselaer V, Eliassi-Rad T, Akoglu L, Snoeck M, Baesens B, Afraid:Fraud detection via active inference in time-evolving social networks. In:Proc. of the 11th ACM ASONAM Int'l Conf. on Advances in Social Networks Analysis and Mining. Paris:ACM, 2015. 659-666.[doi:10.1145/2808797.2810058]
    [8] Vlasselaer V, Akoglu L, Eliassi-Rad T, Snoeck M. Guilt-by-constellation:Fraud detection by suspicious clique memberships. In:Proc. of the 48th IEEE HICSS Hawaii Int'l Conf. on System Sciences. Hawaii:IEEE, 2015. 918-927.[doi:10.1109/hicss.2015. 114]
    [9] Zhu H, Xiong H, Ge Y, Chen E. Discovery of ranking fraud for mobile apps. IEEE Trans. on Knowledge and Data Engineering, 2015,27(1):74-87.[doi:10.1109/TKDE.2014.2320733]
    [10] Heindorf S, Potthast M, Stein B, Engels G. Vandalism detection in wikidata. In:Proc. of the 25th ACM CIKM Int'l on Conf. on Information and Knowledge Management. Indiana:ACM, 2016. 327-336.[doi:10.1145/2983323.2983740]
    [11] Kumar S, Spezzano F, Subrahmanian V. VEWS:A Wikipedia vandal early warning system. In:Proc. of the 21st ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Sydney:ACM, 2015. 607-616.[doi:10.1145/2783258.2783367]
    [12] Li X, Han J. Mining approximate top-k subspace anomalies in multi-dimensional time-series data. In:Proc. of the 33rd ACM VLDB Int'l Conf. on Very Large Data Bases. Vienna:ACM, 2007. 447-458.[doi:10.1023/A:1015417610840]
    [13] Henrion M, Hand D, Gandy A, Mortlock D. Casos:A subspace method for anomaly detection in high dimensional astronomical databases. Statistical Analysis and Data Mining the Asa Data Science Journal, 2013,6(1):53-72.[doi:10.1002/sam.11167]
    [14] Heine F. Outlier detection in data streams using olap cubes. In:Proc. of the Communications in Computer and Information Science. 2017. 29-36.[doi:10.1007/978-3-319-67162-8_4]
    [15] Dalmia A, Gupta M, Varma V. Query-based graph cuboid outlier detection. In:Proc. of the 11th ACM ASONAM Int'l Conf. on Advances in Social Networks Analysis and Mining. Paris:ACM, 2015. 101-113.[doi:10.1145/2808797.2810061]
    [16] Kriegel H, Kroger P, Schubert E, Zimek A. Outlier detection in axis-parallel subspaces of high dimensional data. In:Proc. of the Advances in Knowledge Discovery and Data Mining. 2009. 831-838.[doi:10.1007/978-3-642-01307-2_86]
    [17] He Z, Xu X, Huang Z, Deng S. FP-outlier frequent pattern based outlier detection. Computer Science and Information Systems, 2005,2(1):103-118.[doi:10.2298/csis0501103h]
    [18] Vreeken J, Leeuwen M, Siebes A. Krimp:Mining itemsets that compress. Data Mining and Knowledge Discovery, 2011,2(1):169-214.[doi:10.1007/s10618-010-0202-x]
    [19] Muller E, Schiffer M, Seidl T. Statistical selection of relevant subspace projections for outlier ranking. In:Proc. of the 27th IEEE Int'l Conf. on Data Engineering. Washington:IEEE, 2011. 434-445.[doi:10.1109/icde.2011.5767916]
    [20] Jiang M, Cui P, Beutel A, Faloutsos C, Yang S. Inferring strange behavior from connectivity pattern in social networks. In:Proc. of the Advances in Knowledge Discovery and Data Mining. 2014. 126-138.[doi:10.1007/978-3-319-06608-0_11]
    [21] Jiang M, Beutel A, Cui P, Hooi B, Yang S, Faloutsos C. Spotting suspicious behaviors in multimodal data:A general metric and algorithms. IEEE Trans. on Knowledge and Data Engineering, 2016,28(8):2187-2200.[doi:10.1109/tkde.2016.2555310]
    [22] Eswaran D, Gnnemann S, Faloutsos C, Makhija D, Kumar M. Zoobp:Belief propagation for heterogeneous networks. Proc. of the VLDB Endowment, 2017,10(5):625-636.[doi:10.14778/3055540.3055554]
    [23] Costa A, Yamaguchi Y, Traina A, Faloutsos C. RSC:Mining and modeling temporal activity in social media. In:Proc. of the 21st ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Sydney:ACM, 2015. 269-278.[doi:10.1145/2783258. 2783294]
    [24] Breiman L. Random forests. Machine Learning, 2001,45:5-32.[doi:10.1007/0-387-21529-8_16]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

郑皎凌,乔少杰,舒红平,应广华,Louis Alberto GUTIERREZ.面向多维稀疏数据仓库的欺诈销售行为挖掘.软件学报,2020,31(3):710-725

Copy
Share
Article Metrics
  • Abstract:2680
  • PDF: 5638
  • HTML: 3053
  • Cited by: 0
History
  • Received:July 20,2019
  • Revised:September 10,2019
  • Online: January 10,2020
  • Published: March 06,2020
You are the first2032482Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063