Fisher Consistent Cost Sensitive Boosting Algorithm
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [17]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    AdaBoost is a meta ensemble learning algorithm. The most important theoretical property behind it is "Boosting", which also plays an important role in cost sensitive learning. However, available cost sensitive Boosting algorithms, such as AdaCost, AdaC1, AdaC2, AdaC3, CSB0, CSB1 and CSB2, are just heuristic. They add cost parameters into voting weight calculation formula or sample weights updating strategy of AdaBoost, so that the algorithms are forced to focus on samples with higher misclassification costs. However, these heuristic modifications have no theoretical foundations. The worst thing is that they break the most important theoretical property of AdaBoost, namely “Boosting”. Compared to AdaBoost which converges to optimal Bayes decision rule, those cost sensitive algorithms do not converge to cost sensitive decision rule. This paper studies the problem of designing cost sensitive Boosting algorithms strictly under Boosting theory. First, two new loss functions are constructed by making exponential loss and logit loss cost sensitive. It can be proved that the new loss functions are Fisher consistent in cost sensitive setting, therefore optimizing them finally leads to cost sensitive Bayes decision rule. Performing gradient decent in functional space to optimize these two loss functions then results in new cost sensitive Boosting algorithms: AsyB and AsyBL. Experimental results on synthetic Gaussian data prove that in comparison with other cost sensitive Boosting algorithms, AsyB and AsyBL always better approximate cost sensitive Bayes decision rule. Experimental results on UCI datasets further prove that AsyB and AsyBL generate better cost sensitive classifiers with lower misclassification costs and the misclassification costs decrease exponentially with iterations.

    Reference
    [1] López V, Fernández A, Moreno-Torres JG, Herrera F. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Systems with Applications, 2012,39(7):6585-6608.[doi: 10. 1016/j.eswa.2011.12.043]
    [2] Lomax S, Vadera S. A survey of cost-sensitive decision tree induction algorithms. ACM Computing Surveys (CSUR), 2013,45(2): 16-50.[doi: 10.1145/2431211.2431215]
    [3] Domingos P. Metacost: A general method for making classifiers cost-sensitive. In: Proc. of the 5th ACM SIGKDD Int''l Conf. on Knowledge Discovery and Data Mining (KDD''99). New York: ACM Press, 1999. 155-164.[doi: 10.1145/312129.312220]
    [4] Elkan C. The foundations of cost-sensitive learning. In: Proc. of the 17th Int''l Joint Conf. on Artificial Intelligence (IJCAI 2001). San Francisco: Morgan Kaufmann Publishers, 2001. 973-978.
    [5] Zadrozny B, Langford J, Abe N. Cost-Sensitive learning by cost-proportionate example weighting. In: Proc. of the 3rd IEEE Int''l Conf. on Data Mining (ICDM 2003). New York: IEEE, 2003. 435-442.[doi: 10.1109/ICDM.2003.1250950]
    [6] Krawczyk B, Woźniak M. Designing cost-sensitive ensemble-genetic approach. Advances in Intelligent and Soft Computing, 2011, 102:227-234.[doi: 10.1007/978-3-642-23154-4_26]
    [7] Schapire RE, Freund Y. Boosting: Foundations and Algorithms. Cambridge: The MIT Press, 2012.
    [8] Fan W, Stolfo SJ, Zhang J, Chan PK. AdaCost: Misclassification cost-sensitive boosting. In: Proc. of the 16th Int''l Conf. on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 1999. 97-105.
    [9] Sun Y, Wong AK, Wang Y. Parameter inference of cost-sensitive boosting algorithms. In: Proc. of the 4th Int''l Conf. on Machine Learning and Data Mining in Pattern Recognition. Berlin: Springer-Verlag, 2005. 21-30.[doi: 10.1007/11510888_3]
    [10] Sun Y, Kamel MS, Wong AK, Wang Y. Cost-Sensitive boosting for classification of imbalanced data. Pattern Recognition, 2007, 40(12):3358-3378.[doi: 10.1016/j.patcog.2007.04.009]
    [11] Ting KM. A comparative study of cost-sensitive boosting algorithms. In: Proc. of the 17th Int''l Conf. on Machine Learning (ICML 2000). San Francisco: Morgan Kaufmann Publishers, 2000. 983-990.
    [12] Masnadi-Shirazi H, Vasconcelos N. Asymmetric Boosting. In: Proc. of the 24th Int''l Conf. on Machine Learning (ICML 2007). San Francisco: Margan Kaufmann Publishers, 2007. 609-619.[doi: 10.1145/1273496.1273573]
    [13] Masnadi-Shirazi H, Vasconcelos N. Cost-Sensitive Boosting. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2011, 33(2):294-309.[doi: 10.1109/TPAMI.2010.71]
    [14] Landesa-Vázquez I, Alba-Castro JL. Shedding light on the asymmetric learning capability of AdaBoost. Pattern Recognition Letters, 2011,33(3):247-255.[doi: 10.1016/j.patrec.2011.10.022]
    [15] Lin Y. A note on margin-based loss functions in classification. Statistics & Probability Letters, 2004,68(1):73-82.[doi: 10.1016/j. spl.2004.03.002]
    [16] Zou H, Zhu J, Hastie T. New multicategory boosting algorithms based on multicategory fisher-consistent losses. The Annals of Applied Statistics, 2008,2(4):1290-1306.[doi: 10.1214/08-AOAS198]
    [17] Friedman J, Hastie T, Tibshirani R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics, 2000,28(2):337-407.[doi: 10.1214/aos/1016218223]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

曹莹,苗启广,刘家辰,高琳.具有Fisher一致性的代价敏感Boosting算法.软件学报,2013,24(11):2584-2596

Copy
Share
Article Metrics
  • Abstract:6384
  • PDF: 8010
  • HTML: 0
  • Cited by: 0
History
  • Received:May 31,2013
  • Revised:July 17,2013
  • Online: November 01,2013
You are the first2032338Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063