一种大数据环境下的在线社交媒体位置推断方法
作者:
基金项目:

国家自然科学基金(61272109, 61502350); 中央高校基本科研业务费专项资金(2042014kf0057); 湖北省自然科学基金(2014CFB289)


Location Inference Method in Online Social Media with Big Data
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [38]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着在线社交媒体的快速发展和可定位设备的大量普及,地理位置作为社交媒体大数据中一种质量极高的信息资源,开始在疾病控制、人口流动性分析和广告精准投放等方面得到广泛应用.但是,由于大量用户没有指定或者不能准确指定位置,社交媒体上的地理位置数据十分稀疏.针对此数据稀疏性问题,提出一种基于用户生成内容的位置推断方法UGC-LI(user generate content driven location inference method),实现对社交媒体用户和生成文本位置的推断,为基于位置的个性化信息服务提供数据支撑.通过抽取用户生成文本中的本地词语,构建一个基于词汇地理分布差异和用户社交图谱的概率模型,在多层次的地理范围内推断用户位置.同时,提出一个基于位置的参数化语言模型,计算用户生成文本发出的城市.在真实数据集上进行的评估实验表明:UGC-LI方法能够在15km偏移距离准确定位64.2%的用户,对用户所在城市的推断准确率达到81.3%;同时,可正确定位32.7%的用户生成文本发出的城市,与现有方法相比有明显的提高.

    Abstract:

    As a high-quality source in social media big data, the geographic location has been widely adopted in the fields of disease control, population mobility analysis and ad delivery positioning with the rapid development of online social media and the prevalence of localizable mobile devices. However, the location data are quite sparse because often the locations cannot be accurately specified by the users. To overcome this data sparsity problem, this paper proposes UGC-LI, a user generate content driven location inference method to infer the location where users and social texts are created. The method can provide supporting data for location-based personalized information services. A probability model is constructed by comprehensive considering the distribution of location words and social graph of users via local words extracted from user generated texts to locate the users in multi-granularity. Further, a parameterized linguistic model based on location is presented to calculate the city where the tweet is published. The results of experiment on real-word dataset demonstrate that this new method outperforms existing algorithms. In the experiment, 64.2% of users are identified within 15km displacement distance, 81.3% of the living cities and 32.7% of the cities where the tweets were tweeted are correctly located.

    参考文献
    [1] LaBute M, McMahon BH, Brown M, Manore C, Fair JM. A flexible spatial framework for modeling spread of pathogens in animals with biosurveillance and disease control applications. ISPRS Int'l Journal of Geo-Information, 2014,3(2):638-661.[doi:10.3390/ijgi3020638]
    [2] Lan L, Malbasa V, Vucetic S. Spatial scan for disease mapping on a mobile population. In:Proc. of the 28th AAAI Conf. on Artificial Intelligence. AAAI, 2014. 431-437.
    [3] Tan ZX. Spatial advertisement competition:Based on game theory. Journal of Applied Mathematics, 2014,216193:1-5.[doi:10. 1155/2014/216193]
    [4] Agarwal A, Hosanagar K, Smith MD. Location, location, location:An analysis of profitability of position in online advertising markets. Journal of Marketing Research, 2011,48(6):1057-1073.[doi:10.1509/jmr.08.0468]
    [5] Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twitter users:Real-Time event detection by social sensors. In:Proc. of the 19th Int'l Conference on World Wide Web. Raleigh, 2010. 851-860.[doi:10.1145/1772690.1772777]
    [6] Nishi K, Tsubouchi K, Shimosaka M. Hourly pedestrian population trends estimation using location data from smartphones dealing with temporal and spatial sparsity. In:Proc. of the 22nd ACM Int'l Conf. on Advances in Geographic Information Systems. Dallas/Fort Worth:SIGSPATIAL, 2014. 281-290.[doi:10.1145/2666310.2666391]
    [7] Kotzias D, Lappas T, Gunopulos D. Addressing the sparsity of location information on Twitter. In:Proc. of the Workshops of the Joint Conf. of the 17th Int'l Conf. on Extending Database Technology and the 17th Int'l Conf. on Database Theory. Athens:EDBT/ICDT, 2014. 339-346.
    [8] Cheng ZY, Caverlee J, Lee KM. A content-driven framework for geolocating microblog users. ACM Trans. on Intelligent Systems and Technology, 2013,4(1):Article 2.[doi:10.1145/2414425.2414427]
    [9] Ryoo KM, Moon S. Inferring Twitter user locations with 10km accuracy. In:Proc. of the 23rd Int'l World Wide Web Conf. Seoul, 2014. 643-648.
    [10] Kinsella S, Murdock V, O'Hare N. "I'm eating a sandwich in Glasgow":Modeling locations with tweets. In:Proc. of the 3rd Int'l CIKM Workshop on Search and Mining User-Generated Contents. Glasgow, 2011. 61-68.[doi:10.1145/2065023.2065039]
    [11] Wang ZF, Feng J, Xing CY, Zhang GM, Xu B. Research on the IP geolocation technology. Ruan Jian Xue Bao/Journal of Software, 2014,25(7):1527-1540 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4621.htm[doi:10.13328/j.cnki.jos.00 4621]
    [12] Hecht B, Hong LC, Suh BW, Chi EH. Tweets from Justin Bieber's heart:The dynamics of the location field in user profiles. In:Proc. of the Int'l Conf. on Human Factors in Computing Systems. Vancouver, 2011. 237-246.[doi:10.1145/1978942.1978976]
    [13] Eisenstein J, O'Connor B, Smith NA, Xing EP. A latent variable model for geographic lexical variation. In:Proc. of the 2010 Conf. on Empirical Methods in Natural Language Processing. MIT Stata Center, 2010. 1277-1287.
    [14] Kwak H, Lee CH, Park HS, Moon S. What is Twitter, a social network or a news media? In:Proc. of the 19th Int'l World Wide Web Conf. Raleigh, 2010. 591-600.[doi:10.1145/1772690.1772751]
    [15] Ahmed A, Hong LJ, Smola A. Hierarchical geographical modeling of user locations from social media posts. In:Proc. of the 22nd Int'l World Wide Web Conf. Rio de Janeiro:WWW, 2013. 25-36.
    [16] Chang HW, Lee DW, Eltaher M, Lee JK.@Phillies tweeting from philly? Predicting twitter user locations with spatial word usage. In:Proc. of the Int'l Conf. on Advances in Social Networks Analysis and Mining. Istanbul:ASONAM, 2012. 111-118.[doi:10. 1109/ASONAM.2012.29]
    [17] Backstrom L, Kleinberg JM, Kumar R, Novak J. Spatial variation in search engine queries. In:Proc. of the 17th Int'l Conf. on World Wide Web. Beijing, 2008. 357-366.[doi:10.1145/1367497.1367546]
    [18] Ren KJ, Zhang SW, Lin HF. Where are you settling down:Geo-locating Twitter users based on tweets and social networks. In:Proc. of the 8th Asia Information Retrieval Societies Conf. on Information Retrieval Technology. Tianjin, 2012. 150-161.[doi:10. 1007/978-3-642-35341-3_13]
    [19] Backstrom L, Sun E, Marlow C. Find me if you can:Improving geographical prediction with social and spatial proximity. In:Proc. of the 19th Int'l Conf. on World Wide Web. Raleigh, 2010. 61-70.[doi:10.1145/1772690.1772698]
    [20] Chandra S, Khan L, Muhaya FB. Estimating Twitter user location using social interactions-A content based approach. In:Proc. of 2011 IEEE the 3rd Int'l Conf. on the Privacy, Security, Risk and Trust and 2011 IEEE the 3rd Int'l Conf. on Social Computing. Boston:SocialCom/PASSAT, 2011. 838-843.[doi:10.1109/PASSAT/SocialCom.2011.120]
    [21] Jurgens D. That's what friends are for:Inferring location in online social media platforms based on social relationships. In:Proc. of the 7th Int'l Conf. on Weblogs and Social Media. Cambridge:ICWSM, 2013
    [22] Li R, Wang SJ, Chang KCC. Multiple location profiling for users and relationships from social network and content. PVLDB, 2012, 5(11):1603-1614.[doi:10.14778/2350229.2350273]
    [23] McGee J, Caverlee J, Cheng ZY. Location prediction in social media based on tie strength. In:Proc. of the 22nd ACM Int'l Conf. on Information and Knowledge Management. San Francisco:CIKM, 2013. 459-468.[doi:10.1145/2505515.2505544]
    [24] Li W, Serdyukov P, de Vries AP, Eickhoff C, Larson M. The where in the tweet. In:Proc. of the 20th ACM Conf. on Information and Knowledge Management. Glasgow, 2011. 2473-2476.[doi:10.1145/2063576.2063995]
    [25] Zhao RJ, Cao SX. A user relationship-based approach for location recommendation in microblog. In:Proc. of the 11th National Seminar on Internet and Audio/Video and Broadcasting Development. Wuhan, 2012. 165-169 (in Chinese with English abstract).
    [26] Guo C, Liu JN, Fang Y, Luo M, Cui JS. Value extraction and collaborative mining methods for location big data. Ruan Jian Xue Bao/Journal of Software, 2014,25(4):713-730 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4570.htm[doi:10.13328/j.cnki.jos.004570]
    [27] Yardi S, Boyd D. Tweeting from the town square:measuring geographic local networks. In:Proc. of the 4th Int'l AAAI Conf. on Weblogs and Social Media. AAAI, 2010. 194-201.
    [28] McGee J, Caverlee J, Cheng ZY. A geographic study of tie strength in social media. In:Proc. of the 20th ACM Conf. on Information and Knowledge Management. Glasgow:CIKM, 2011. 2333-2336.[doi:10.1145/2063576.2063959]
    [29] Lichtenwalter R, Lussier JT, Chawla NV. New perspectives and methods in link prediction. In:Proc. of the 16th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Washington, 2010. 243-252.[doi:10.1145/1835804.1835837]
    [30] Kwak H, Lee CH, Park H, Moon S. What is Twitter, a social network or a news media? In:Proc. of the 19th Int'l Conf. on World Wide Web. Raleigh:WWW, 2010. 591-600.
    [31] Wang X, Xu M, Ren YZ, Xu J, Zhang HP, Zheng N. A location inferring model based on tweets and bilateral follow friends. Journal of Computer, 2014,9(2):315-321.[doi:10.4304/jcp.9.2.315-321]
    [32] Zhai CX, Lafferty JD. A study of smoothing methods for language models applied to information retrieval. ACM Trans. on Information Systems, 2004,22(2):179-214.[doi:10.1145/984321.984322]
    [33] Che WX, Li ZH, Liu T. LTP, A Chinese language technology platform. In:Proc. of the Coling 2010:Demonstrations. Beijing, 2010. 13-16.
    [34] Vardi Y, Zhang CH. The multivariate L1-median and associated data depth. Proc. of the National Academy of Sciences, 2000,97(4):1423-1426.[doi:10.1073/pnas.97.4.1423]
    附中文参考文献:
    [11] 王占丰,冯径,邢长友,张国敏,许博.IP定位技术的研究.软件学报,2014,25(7):1527-1540. http://www.jos.org.cn/1000-9825/4621.htm[doi:10.13328/j.cnki.jos.004621]
    [25] 赵荣娇,曹三省.一种基于用户关系的微博位置推荐方法.见:第11届全国互联网与音视频广播发展研讨会.武汉,2012. 165-169.
    [26] 郭迟,刘经南,方媛,罗梦,崔竞松.位置大数据的价值提取与协同挖掘方法.软件学报,2014,25(4):713-730. http://www.jos.org.cn/1000-9825/4570.htm[doi:10.13328/j.cnki.jos.004570]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王凯,余伟,杨莎,吴敏,胡亚慧,李石君.一种大数据环境下的在线社交媒体位置推断方法.软件学报,2015,26(11):2951-2963

复制
分享
文章指标
  • 点击次数:6103
  • 下载次数: 9325
  • HTML阅读次数: 3372
  • 引用次数: 0
历史
  • 收稿日期:2015-05-31
  • 最后修改日期:2015-08-26
  • 在线发布日期: 2015-11-04
文章二维码
您是第20651785位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号