主页期刊介绍编委会编辑部服务介绍道德声明在线审稿编委办公编辑办公English
2020-2021年专刊出版计划 微信服务介绍 最新一期:2020年第5期
     
在线出版
各期目录
纸质出版
分辑系列
论文检索
论文排行
综述文章
专刊文章
美文分享
各期封面
E-mail Alerts
RSS
旧版入口
中国科学院软件研究所
  
投稿指南 问题解答 下载区 收费标准 在线投稿
周诗龙,徐俊刚.基于分析特征与动态步长的微博排序学习算法.软件学报,2013,24(S2):150-161
基于分析特征与动态步长的微博排序学习算法
Learning to Rank Algorithm for Microblogs Based on Analysis Features and Dynamic Stepsize
投稿时间:2013-03-15  修订日期:2013-07-11
DOI:
中文关键词:  微博  ListNet  动态步长  分析特征  排序学习
英文关键词:microblog  ListNet  dynamic stepsize  analysis feature  learning to rank
基金项目:国家科技支撑计划(2012BAH23B03)
作者单位E-mail
周诗龙 中国科学院大学 计算机与控制学院, 北京 100190 kunlong0909@163.com 
徐俊刚 中国科学院大学 计算机与控制学院, 北京 100190  
摘要点击次数: 2365
全文下载次数: 3034
中文摘要:
      目前,微博搜索大多应用向量空间模型计算查询词与文档间的相关程度,通常使用TF-IDF(termfrequency-inverse document frequency)统计方法来确定词的权重.然而仅使用词进行微博搜索并不能检测到某条微博的信息含量,而这些往往是查询用户所关注的问题.为此提出了一种基于分析特征与动态步长的微博排序学习算法.首先,定义了一些微博分析特征,经过统计分析获得的这些分析特征可以用来预测用户行为;其次,在此基础上,提出了以词性为单位计算微博相关度的方法,结合信息熵计算方法得到微博词性信息的含量,并用来预测该微博的信息含量;最后,在现有ListNet排序学习算法的基础上,引入了动态步长的概念,对步长进行了动态优化,最终形成了一种基于动态步长的微博排序学习算法——RDLS(ranking based on dynamic learning stepsize)算法.实验结果表明,无论是基于直接特征还是加入分析特征,在相同迭代轮数情况下,相比ListNet算法,RDLS 算法可以训练出更优的模型,在微博排序方面有更好的表现.
英文摘要:
      Currently, most of searching methods for microblog use vector space model to calculate the relevance between the query and document. The statistical method of Term Frequency-Inverse Document Frequency (TF-IDF) is widely used to determine the weight of words. However, only using word as the unit of microblog searching is not enough to detect the whole information content of a microblog, which is usually the intent of the search users. To solve this problem, a learning to rank algorithm for microblogs based on analysis features and dynamic stepsize is proposed. Firstly, some analysis features for microblogs are defined, The features can be obtained through statistical analysis method, and used to predict user's behaviors. Secondly, a method to calculate the relevance of microblogs based on part of speech is proposed. It uses the strategy of information entropy to calculate POS information content of microblog and it can be used to predict the information content of the microblog. Finally, based on the existing ListNet algorithm, the concept of dynamic stepsize is introduced to optimize the calculation of stepsize, eventually a learning to rank algorithm for microblogs based on dynamic stepsize named Ranking based on Dynamic Learning Stepsize (RDLS) algorithm is formulated. The experimental results show that RDLS algorithm can get a more optimal training model by using either direct features or both direct and analysis features with the same iterations, and can attain better effect in microblog ranking compared with the ListNet algorithm.
HTML  下载PDF全文  查看/发表评论  下载PDF阅读器
 

京公网安备 11040202500064号

主办单位:中国科学院软件研究所 中国计算机学会 京ICP备05046678号-4
编辑部电话:+86-10-62562563 E-mail: jos@iscas.ac.cn
Copyright 中国科学院软件研究所《软件学报》版权所有 All Rights Reserved
本刊全文数据库版权所有,未经许可,不得转载,本刊保留追究法律责任的权利