DF or IDF? On the Use of Primary Feature Model for Web Information Retrieval
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In Web information retrieval (IR), input queries are too short and fuzzy to describe user request, which leads to the mismatch problem between user query and the documents full of redundancy and noise. This paper first studies the feature of web documents information and proposes the concepts of primary feature word, primary feature field and primary feature space (PFS). Then a new PFS query term weighting scheme is proposed, which takes document frequency (DF) into account instead of the traditional IDF factor. Finally, a combination strategy of term weighting is given. Using this PFS model, three groups of experiments have been performed on 10G and 19G large scale Web collections with TREC9, TREC10 and TREC11 standard tests of Web tracks. Comparative studies indicate that the new DF-related PFS term weighting improves the system performance consistently and effectively in terms of recall, top n precision and mean average precision. At most 18.6% improvement has been made.

    Reference
    Related
    Cited by
Get Citation

张敏,马少平,宋睿华. DF还是IDF?主特征模型在Web信息检索中的使用.软件学报,2005,16(5):1012-1020

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:October 14,2003
  • Revised:September 08,2004
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063