Efficiently Active Learning for Semi-Supervised Document Clustering
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Semi-Supervised document clustering and employing limited prior knowledge to aid in unsupervised clustering, have recently become a topic of significant interest to data mining and machine learning communities. Because receiving supervised data may be expensive, it is important to attain the most informative knowledge to improve the clustering performance. This paper presents a semi-supervised document clustering algorithm with active learning for pairwise constraints, aiming at getting improved clustering performance. The semi-supervised document clustering algorithm is a constrained DBSCAN (cons-DBSCAN) algorithm, which incorporates pairwise constraints to guide the clustering process in DBSCAN. Basing on measure of constraint set utility and analysis of DBSCAN algorithm, an active learning approach is proposed to select informative document pairs for obtaining user feedbacks. Experimental results show that this proposed approach is effective in document clustering. The clustering performance of active Cons-DBSCAN has dramatically improved with selected pairwise constraints. Moreover, the proposed approach performs better than the two representative methods.

    Reference
    Related
    Cited by
Get Citation

赵卫中,马慧芳,李志清,史忠植.一种结合主动学习的半监督文档聚类算法.软件学报,2012,23(6):1486-1499

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:November 03,2010
  • Revised:March 31,2011
  • Adopted:
  • Online: June 05,2012
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063