自适应信息过滤中使用少量正例进行阈值优化

微信服务号

微信订阅号

首页 > 过刊浏览>2003年第14卷第10期 >1697-1705

自适应信息过滤中使用少量正例进行阈值优化
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the Supported by the National Natural Science Foundation of China under Grant Nos.69873011, 69935010, 60103014 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant Nos.2002AA142090, 2001AA114120 (国家高技术研究发展计划（863））

Threshold Optimization with a Small Number of Samples in Adaptive Information Filtering

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

自适应信息过滤中一个大的挑战在于其数据稀疏问题.因此,在对输入的文本流进行过滤的同时学习最优阈值非常重要.提出了一种新颖的阈值优化算法.该算法可以通过少量的正例进行快速的学习,所需数据的获得具有增量性,故而其计算量及所需的存储空间很小.此外,该算法还具有高效、健壮、实用性强等优点.在第10届国际文本检索会议(TREC10)上,复旦大学的自适应信息过滤系统使用了该阈值优化算法,并取得了第3名的成绩.其T10U和T10F分别达到了0.215和0.414.

Abstract:

One special challenge in adaptive information filtering is the problem of extremely sparse data. So it is very important to learn optimal threshold while filtering the input textual stream. In this paper, an algorithm is presented for the threshold optimization. The algorithm learns fast by using few positive samples. Moreover, most of the quantities the algorithm requires can be updated incrementally, so its memory and computational power requirements are low. It also has the merits of effective, robust, and practically useful. Fudan University's adaptive text filtering system used this algorithm for the first time and came in third in all runs of TREC10. Its T10U and T10F are 0.215 and 0.414 respectively.

参考文献

相似文献

引证文献

引用本文

夏迎炬,黄萱菁,胡恬,吴立德.自适应信息过滤中使用少量正例进行阈值优化.软件学报,2003,14(10):1697-1705

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2002-06-01
最后修改日期:2002-09-04
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码