Open Source Software Classification Using Cost-Sensitive Multi-Label Learning
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the explosive growth of open source software, retrieving desired software in open source software communities becomes a great challenge. Tagging open source software is usually a manual process which assigns software with several tags describing its functions and characteristics. Users can search their desired software by matching the keywords. Because of the simplicity and convenience, software retrieval based on tags has been widely used. However, since human effort is expensive and time-consuming, developers are not willing to tag software sufficiently when uploading software projects. Thus automatic software tagging, with tags describing functions and characteristics according to software projects’ text descriptions provided by users, becomes key to effective software retrieval. This article formalizes this problem as a multi-label learning problem and proposes a new multi-label learning method ML-CKNN which can effectively solve this problem when the number of different tags is extremely large. By imposing cost value of wrong classification into multi-label learning, ML-CKNN can effectively solve this imbalanced problem, as each tag instances associated with this tag are much less than those not associated with this tag. Experiments on three open source software community datasets show that ML-CKNN can provide high-quality tags for new uploading open source software while significantly outperforming existing methods.

    Reference
    Related
    Cited by
Get Citation

韩乐,黎铭.基于代价敏感多标记学习的开源软件分类.软件学报,2014,25(9):1982-1991

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 24,2014
  • Revised:May 14,2014
  • Adopted:
  • Online: September 09,2014
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063