Online Comment Clustering Based on an Improved Semantic Distance

doi:10.13328/j.cnki.jos.004729

微信服务号

微信订阅号

2025-6-3- 22

Home > Archive>Volume 25, Issue 12, 2014 >2777-2789. DOI:10.13328/j.cnki.jos.004729

PDF HTML XML Export Cite reminder

Online Comment Clustering Based on an Improved Semantic Distance
DOI:
                        10.13328/j.cnki.jos.004729
                    
Author:
                        YANG ZhenYANG Zhen
College of Computer Science, Beijing University of Technology, Beijing 100124, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Lai-TaoWANG Lai-Tao
College of Computer Science, Beijing University of Technology, Beijing 100124, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LAI Ying-XuLAI Ying-Xu
College of Computer Science, Beijing University of Technology, Beijing 100124, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

An improved semantic distance for short text is proposed. The new method calculates the semantic distance between two word strings as balance of the extent of word sequence alignment and the meaning matching between word strings. First, after linguistic preprocessing, the extent of word sequence alignment is computed by the structural distance which measures the maximum matching based on the HIT-CIR Tongyici Cilin (extended edition). Then the meaning matching between word strings is computed by an improved edit distance which allocates each word a weight according to its word type. Finally, the semantic distance between the word strings is measured as a balance of structural distance and word meaning matching distance. In addition, in order to eliminate the influence of the sentence length, the proposed semantic distance is adjusted using the distinct word count estimated by the Heap's law and Zipf law. Experimental results show that the presented methods are more efficient than the classical edit distance models.

Key words:text clustering;online comment;semantic distance;length penalty;Heap's law;Zipf's law

Get Citation

杨震,王来涛,赖英旭.基于改进语义距离的网络评论聚类研究.软件学报,2014,25(12):2777-2789

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 05,2014
Revised:August 21,2014
Adopted:
Online: December 04,2014
Published:

You are the first2050453Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History