Location Inference Method in Online Social Media with Big Data
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [38]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    As a high-quality source in social media big data, the geographic location has been widely adopted in the fields of disease control, population mobility analysis and ad delivery positioning with the rapid development of online social media and the prevalence of localizable mobile devices. However, the location data are quite sparse because often the locations cannot be accurately specified by the users. To overcome this data sparsity problem, this paper proposes UGC-LI, a user generate content driven location inference method to infer the location where users and social texts are created. The method can provide supporting data for location-based personalized information services. A probability model is constructed by comprehensive considering the distribution of location words and social graph of users via local words extracted from user generated texts to locate the users in multi-granularity. Further, a parameterized linguistic model based on location is presented to calculate the city where the tweet is published. The results of experiment on real-word dataset demonstrate that this new method outperforms existing algorithms. In the experiment, 64.2% of users are identified within 15km displacement distance, 81.3% of the living cities and 32.7% of the cities where the tweets were tweeted are correctly located.

    Reference
    [1] LaBute M, McMahon BH, Brown M, Manore C, Fair JM. A flexible spatial framework for modeling spread of pathogens in animals with biosurveillance and disease control applications. ISPRS Int'l Journal of Geo-Information, 2014,3(2):638-661.[doi:10.3390/ijgi3020638]
    [2] Lan L, Malbasa V, Vucetic S. Spatial scan for disease mapping on a mobile population. In:Proc. of the 28th AAAI Conf. on Artificial Intelligence. AAAI, 2014. 431-437.
    [3] Tan ZX. Spatial advertisement competition:Based on game theory. Journal of Applied Mathematics, 2014,216193:1-5.[doi:10. 1155/2014/216193]
    [4] Agarwal A, Hosanagar K, Smith MD. Location, location, location:An analysis of profitability of position in online advertising markets. Journal of Marketing Research, 2011,48(6):1057-1073.[doi:10.1509/jmr.08.0468]
    [5] Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twitter users:Real-Time event detection by social sensors. In:Proc. of the 19th Int'l Conference on World Wide Web. Raleigh, 2010. 851-860.[doi:10.1145/1772690.1772777]
    [6] Nishi K, Tsubouchi K, Shimosaka M. Hourly pedestrian population trends estimation using location data from smartphones dealing with temporal and spatial sparsity. In:Proc. of the 22nd ACM Int'l Conf. on Advances in Geographic Information Systems. Dallas/Fort Worth:SIGSPATIAL, 2014. 281-290.[doi:10.1145/2666310.2666391]
    [7] Kotzias D, Lappas T, Gunopulos D. Addressing the sparsity of location information on Twitter. In:Proc. of the Workshops of the Joint Conf. of the 17th Int'l Conf. on Extending Database Technology and the 17th Int'l Conf. on Database Theory. Athens:EDBT/ICDT, 2014. 339-346.
    [8] Cheng ZY, Caverlee J, Lee KM. A content-driven framework for geolocating microblog users. ACM Trans. on Intelligent Systems and Technology, 2013,4(1):Article 2.[doi:10.1145/2414425.2414427]
    [9] Ryoo KM, Moon S. Inferring Twitter user locations with 10km accuracy. In:Proc. of the 23rd Int'l World Wide Web Conf. Seoul, 2014. 643-648.
    [10] Kinsella S, Murdock V, O'Hare N. "I'm eating a sandwich in Glasgow":Modeling locations with tweets. In:Proc. of the 3rd Int'l CIKM Workshop on Search and Mining User-Generated Contents. Glasgow, 2011. 61-68.[doi:10.1145/2065023.2065039]
    [11] Wang ZF, Feng J, Xing CY, Zhang GM, Xu B. Research on the IP geolocation technology. Ruan Jian Xue Bao/Journal of Software, 2014,25(7):1527-1540 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4621.htm[doi:10.13328/j.cnki.jos.00 4621]
    [12] Hecht B, Hong LC, Suh BW, Chi EH. Tweets from Justin Bieber's heart:The dynamics of the location field in user profiles. In:Proc. of the Int'l Conf. on Human Factors in Computing Systems. Vancouver, 2011. 237-246.[doi:10.1145/1978942.1978976]
    [13] Eisenstein J, O'Connor B, Smith NA, Xing EP. A latent variable model for geographic lexical variation. In:Proc. of the 2010 Conf. on Empirical Methods in Natural Language Processing. MIT Stata Center, 2010. 1277-1287.
    [14] Kwak H, Lee CH, Park HS, Moon S. What is Twitter, a social network or a news media? In:Proc. of the 19th Int'l World Wide Web Conf. Raleigh, 2010. 591-600.[doi:10.1145/1772690.1772751]
    [15] Ahmed A, Hong LJ, Smola A. Hierarchical geographical modeling of user locations from social media posts. In:Proc. of the 22nd Int'l World Wide Web Conf. Rio de Janeiro:WWW, 2013. 25-36.
    [16] Chang HW, Lee DW, Eltaher M, Lee JK.@Phillies tweeting from philly? Predicting twitter user locations with spatial word usage. In:Proc. of the Int'l Conf. on Advances in Social Networks Analysis and Mining. Istanbul:ASONAM, 2012. 111-118.[doi:10. 1109/ASONAM.2012.29]
    [17] Backstrom L, Kleinberg JM, Kumar R, Novak J. Spatial variation in search engine queries. In:Proc. of the 17th Int'l Conf. on World Wide Web. Beijing, 2008. 357-366.[doi:10.1145/1367497.1367546]
    [18] Ren KJ, Zhang SW, Lin HF. Where are you settling down:Geo-locating Twitter users based on tweets and social networks. In:Proc. of the 8th Asia Information Retrieval Societies Conf. on Information Retrieval Technology. Tianjin, 2012. 150-161.[doi:10. 1007/978-3-642-35341-3_13]
    [19] Backstrom L, Sun E, Marlow C. Find me if you can:Improving geographical prediction with social and spatial proximity. In:Proc. of the 19th Int'l Conf. on World Wide Web. Raleigh, 2010. 61-70.[doi:10.1145/1772690.1772698]
    [20] Chandra S, Khan L, Muhaya FB. Estimating Twitter user location using social interactions-A content based approach. In:Proc. of 2011 IEEE the 3rd Int'l Conf. on the Privacy, Security, Risk and Trust and 2011 IEEE the 3rd Int'l Conf. on Social Computing. Boston:SocialCom/PASSAT, 2011. 838-843.[doi:10.1109/PASSAT/SocialCom.2011.120]
    [21] Jurgens D. That's what friends are for:Inferring location in online social media platforms based on social relationships. In:Proc. of the 7th Int'l Conf. on Weblogs and Social Media. Cambridge:ICWSM, 2013
    [22] Li R, Wang SJ, Chang KCC. Multiple location profiling for users and relationships from social network and content. PVLDB, 2012, 5(11):1603-1614.[doi:10.14778/2350229.2350273]
    [23] McGee J, Caverlee J, Cheng ZY. Location prediction in social media based on tie strength. In:Proc. of the 22nd ACM Int'l Conf. on Information and Knowledge Management. San Francisco:CIKM, 2013. 459-468.[doi:10.1145/2505515.2505544]
    [24] Li W, Serdyukov P, de Vries AP, Eickhoff C, Larson M. The where in the tweet. In:Proc. of the 20th ACM Conf. on Information and Knowledge Management. Glasgow, 2011. 2473-2476.[doi:10.1145/2063576.2063995]
    [25] Zhao RJ, Cao SX. A user relationship-based approach for location recommendation in microblog. In:Proc. of the 11th National Seminar on Internet and Audio/Video and Broadcasting Development. Wuhan, 2012. 165-169 (in Chinese with English abstract).
    [26] Guo C, Liu JN, Fang Y, Luo M, Cui JS. Value extraction and collaborative mining methods for location big data. Ruan Jian Xue Bao/Journal of Software, 2014,25(4):713-730 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4570.htm[doi:10.13328/j.cnki.jos.004570]
    [27] Yardi S, Boyd D. Tweeting from the town square:measuring geographic local networks. In:Proc. of the 4th Int'l AAAI Conf. on Weblogs and Social Media. AAAI, 2010. 194-201.
    [28] McGee J, Caverlee J, Cheng ZY. A geographic study of tie strength in social media. In:Proc. of the 20th ACM Conf. on Information and Knowledge Management. Glasgow:CIKM, 2011. 2333-2336.[doi:10.1145/2063576.2063959]
    [29] Lichtenwalter R, Lussier JT, Chawla NV. New perspectives and methods in link prediction. In:Proc. of the 16th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Washington, 2010. 243-252.[doi:10.1145/1835804.1835837]
    [30] Kwak H, Lee CH, Park H, Moon S. What is Twitter, a social network or a news media? In:Proc. of the 19th Int'l Conf. on World Wide Web. Raleigh:WWW, 2010. 591-600.
    [31] Wang X, Xu M, Ren YZ, Xu J, Zhang HP, Zheng N. A location inferring model based on tweets and bilateral follow friends. Journal of Computer, 2014,9(2):315-321.[doi:10.4304/jcp.9.2.315-321]
    [32] Zhai CX, Lafferty JD. A study of smoothing methods for language models applied to information retrieval. ACM Trans. on Information Systems, 2004,22(2):179-214.[doi:10.1145/984321.984322]
    [33] Che WX, Li ZH, Liu T. LTP, A Chinese language technology platform. In:Proc. of the Coling 2010:Demonstrations. Beijing, 2010. 13-16.
    [34] Vardi Y, Zhang CH. The multivariate L1-median and associated data depth. Proc. of the National Academy of Sciences, 2000,97(4):1423-1426.[doi:10.1073/pnas.97.4.1423]
    附中文参考文献:
    [11] 王占丰,冯径,邢长友,张国敏,许博.IP定位技术的研究.软件学报,2014,25(7):1527-1540. http://www.jos.org.cn/1000-9825/4621.htm[doi:10.13328/j.cnki.jos.004621]
    [25] 赵荣娇,曹三省.一种基于用户关系的微博位置推荐方法.见:第11届全国互联网与音视频广播发展研讨会.武汉,2012. 165-169.
    [26] 郭迟,刘经南,方媛,罗梦,崔竞松.位置大数据的价值提取与协同挖掘方法.软件学报,2014,25(4):713-730. http://www.jos.org.cn/1000-9825/4570.htm[doi:10.13328/j.cnki.jos.004570]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

王凯,余伟,杨莎,吴敏,胡亚慧,李石君.一种大数据环境下的在线社交媒体位置推断方法.软件学报,2015,26(11):2951-2963

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 31,2015
  • Revised:August 26,2015
  • Online: November 04,2015
You are the first2038061Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063