基于分析特征与动态步长的微博排序学习算法
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家科技支撑计划(2012BAH23B03)


Learning to Rank Algorithm for Microblogs Based on Analysis Features and Dynamic Stepsize
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目前,微博搜索大多应用向量空间模型计算查询词与文档间的相关程度,通常使用TF-IDF(termfrequency-inverse document frequency)统计方法来确定词的权重.然而仅使用词进行微博搜索并不能检测到某条微博的信息含量,而这些往往是查询用户所关注的问题.为此提出了一种基于分析特征与动态步长的微博排序学习算法.首先,定义了一些微博分析特征,经过统计分析获得的这些分析特征可以用来预测用户行为;其次,在此基础上,提出了以词性为单位计算微博相关度的方法,结合信息熵计算方法得到微博词性信息的含量,并用来预测该微博的信息含量;最后,在现有ListNet排序学习算法的基础上,引入了动态步长的概念,对步长进行了动态优化,最终形成了一种基于动态步长的微博排序学习算法——RDLS(ranking based on dynamic learning stepsize)算法.实验结果表明,无论是基于直接特征还是加入分析特征,在相同迭代轮数情况下,相比ListNet算法,RDLS 算法可以训练出更优的模型,在微博排序方面有更好的表现.

    Abstract:

    Currently, most of searching methods for microblog use vector space model to calculate the relevance between the query and document. The statistical method of Term Frequency-Inverse Document Frequency (TF-IDF) is widely used to determine the weight of words. However, only using word as the unit of microblog searching is not enough to detect the whole information content of a microblog, which is usually the intent of the search users. To solve this problem, a learning to rank algorithm for microblogs based on analysis features and dynamic stepsize is proposed. Firstly, some analysis features for microblogs are defined, The features can be obtained through statistical analysis method, and used to predict user's behaviors. Secondly, a method to calculate the relevance of microblogs based on part of speech is proposed. It uses the strategy of information entropy to calculate POS information content of microblog and it can be used to predict the information content of the microblog. Finally, based on the existing ListNet algorithm, the concept of dynamic stepsize is introduced to optimize the calculation of stepsize, eventually a learning to rank algorithm for microblogs based on dynamic stepsize named Ranking based on Dynamic Learning Stepsize (RDLS) algorithm is formulated. The experimental results show that RDLS algorithm can get a more optimal training model by using either direct features or both direct and analysis features with the same iterations, and can attain better effect in microblog ranking compared with the ListNet algorithm.

    参考文献
    相似文献
    引证文献
引用本文

周诗龙,徐俊刚.基于分析特征与动态步长的微博排序学习算法.软件学报,2013,24(S2):150-161

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2013-03-15
  • 最后修改日期:2013-07-11
  • 录用日期:
  • 在线发布日期: 2014-01-02
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号