Solution for Automatic Web Review Extraction

微信服务号

微信订阅号

2025-4-24- 17

Home > Archive>Volume 21, Issue 12, 2010 >3220-3236

Solution for Automatic Web Review Extraction
DOI:
                        
                    
Author:
                        LIU WeiLIU Wei

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YAN Hua-LiangYAN Hua-Liang

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XIAO Jian-GuoXIAO Jian-Guo

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZENG Jian-XunZENG Jian-Xun

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Web user reviews are the important information source for many popular applications (e.g. monitoring and analysis of public opinion), and they need to be extracted accurately from Web pages. Web user reviews belong to user-generated contents, whose presentation is not restricted by the Web page template. Therefore new challenges are raised. First, the inconsistency of review contents on both DOM tree and visual appearance impair the similarity between review records; second, the review content in a review record corresponds to a complicated subtree rather than one single node in the DOM tree. To tackle these challenges, a comprehensive solution is proposed to perform automatic extraction of Web reviews by employing sophisticated techniques. The review records are extracted from Web pages based on the level-weighted tree similarity algorithm first, and then, the pure review contents in records are extracted by comparing the node consistency. The experimental results on news Web sites and forum Web sites indicate that our solution can achieve high extraction accuracy and efficiency.

Key words:Web user review; structured data record; Web data extraction

Get Citation

刘伟,严华梁,肖建国,曾建勋.一种Web评论自动抽取方法.软件学报,2010,21(12):3220-3236

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:September 06,2010
Revised:November 24,2010
Adopted:
Online:
Published:

You are the first2038162Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History