TWE-NMF Topic Model-based Approach for Mashup Service Clustering
Author:
Affiliation:

Clc Number:

TP311

  • Article
  • | |
  • Metrics
  • |
  • Reference [41]
  • | | | |
  • Comments
    Abstract:

    With the development of the Internet and service-oriented technology, a new type of Web application—Mashup service, began to become popular on the Internet and grow rapidly. How to find high-quality services among large number of Mashup services has become a focus of attention. It has been shown that finding and clustering services with similar functions can effectively improve the accuracy and efficiency of service discovery. At present, current methods mainly focus on mining the hidden functional information in the Mashup service, and use specific clustering algorithms such as K-means for clustering. However, Mashup service documents are usually short texts. Traditional mining algorithms such as LDA are difficult to represent short texts and find satisfied clustering effects from them. In order to solve this problem, this study proposes a non-negative matrix factorization combining tags and word embedding (TWE-NMF) model to discover topics for the Mashup services. This method firstly normalizes the Mashup service, then uses a Dirichlet process multinomial mixture model based on improved Gibbs sampling to automatically estimate the number of topics. Next, it combines the word embedding and service tag information with non-negative matrix factorization to calculate Mashup topic features. Moreover, a spectral clustering algorithm is used to perform Mashup service clustering. Finally, the performance of the method is comprehensively evaluated. Compared with the existing service clustering method, the experimental results show that the proposed method has a significant improvement in the evaluation indicators such as precision, recall, F-measure, purity, and entropy.

    Reference
    [1] Cao BQ, Xiao QX, Zhang XP, Liu JX. An API service recommendation method via combining self-organization map-based functionality clustering and deep factorization machine-based quality prediction. Chinese Journal of Computers, 2019, 42(6): 1367–1383 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2019.01367] 曹步清, 肖巧翔, 张祥平, 刘建勋. 融合SOM功能聚类与DeepFM质量预测的API服务推荐方法. 计算机学报, 2019, 42(6): 1367–1383. [doi: 10.11897/SP.J.1016.2019.01367]
    [2] Xia BF, Fan YS, Tan W, Huang KM, Zhang J, Wu C. Category-aware API clustering and distributed recommendation for automatic mashup creation. IEEE Transactions on Services Computing, 2015, 8(5): 674–687. [doi: 10.1109/TSC.2014.2379251]
    [3] Li HC, Liu JX, Cao BQ, Shi M. Topic-adaptive Web API recommendation method via integrating multidimensional information. Ruan Jian Xue Bao/Journal of Software, 2018, 29(11): 3374-3387 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5482.htm 李鸿超, 刘建勋, 曹步清, 石敏. 融合多维信息的主题自适应Web API推荐方法. 软件学报, 2018, 29(11): 3374-3387. http://www.jos.org.cn/1000-9825/5482.htm
    [4] Shi M, Liu JX, Zhou D, Cao BQ, Wen YP. Multi-relational topic model-based approach for web services clustering. Chinese Journal of Computers, 2019, 42(4): 820–836 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2019.00820] 石敏, 刘建勋, 周栋, 曹步清, 文一凭. 基于多重关系主题模型的Web服务聚类方法. 计算机学报, 2019, 42(4): 820–836. [doi: 10.11897/SP.J.1016.2019.00820]
    [5] Jiang B, Ye LY, Pan WF, Wang JL. Service clustering based on the functional semantics of requirements. Chinese Journal of Computers, 2018, 41(6): 1255–1266 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2018.01255] 姜波, 叶灵耀, 潘伟丰, 汪家磊. 基于需求功能语义的服务聚类方法. 计算机学报, 2018, 41(6): 1255–1266. [doi: 10.11897/SP.J.1016.2018.01255]
    [6] Shi M, Liu JX, Zhou D, Tang MD, Cao BQ. WE-LDA: A word embeddings augmented LDA model for Web services clustering. In: Proc. of the 2017 IEEE Int’l Conf. on Web Services (ICWS). Honolulu: IEEE, 2017. 9–16.
    [7] Xiao QX, Cao BQ, Zhang XP, Liu JX, Hu R, Li B. Web services clustering based on HDP and SOM neural network. In: Proc. of the 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing. Guangzhou: IEEE, 2018. 397–404.
    [8] Shi M, Liu JX, Cao BQ, Wen YP, Zhang XP. A prior knowledge based approach to improving accuracy of Web services clustering. In: Proc. of the 2018 IEEE Int’l Conf. on Services Computing (SCC). San Francisco: IEEE, 2018. 1–8.
    [9] Cao BQ, Liu XQ, Li B, Liu JX, Tang MD, Zhang TT, Shi M. Mashup service clustering based on an integration of service content and network via exploiting a two-level topic model. In: Proc. of the 2016 IEEE Int’l Conf. on Web Services (ICWS). San Francisco: IEEE, 2016. 212–219.
    [10] Blei DM, Ng AY, Jordan M. Latent dirichlet allocation. Journal of Machine Learning Research, 2001, 3: 601–608.
    [11] Yan XH, Guo JF, Lan YY, Cheng XQ. A biterm topic model for short texts. In: Proc. of the 22nd Int’l Conf. on World Wide Web. Rio de Janeiro: ACM, 2013. 1445–1456.
    [12] Li CL, Wang HR, Zhang ZQ, Sun AX, Ma ZY. Topic modeling for short texts with auxiliary word embeddings. In: Proc. of the 39th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. Pisa: ACM, 2016. 165–174.
    [13] Das R, Zaheer M, Dyer C. Gaussian LDA for topic models with word embeddings. In: Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int’l Joint Conf. on Natural Language Processing. Beijing: ACL, 2015. 795–804.
    [14] Teh YW, Jordan MI, Beal MJ, Blei DM. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006, 101(476): 1566–1581. [doi: 10.1198/016214506000000302]
    [15] Gao W, Chen L, Wu J, Gao HH. Manifold-learning based API recommendation for mashup creation. In: Proc. of the 2015 IEEE Int’l Conf. on Web Services. New York: IEEE, 2015. 432–439.
    [16] Zhang XP, Liu JX, Cao BQ, Xiao QX, Wen YP. Web service recommendation via combining Doc2Vec-based functionality clustering and DeepFM-based score prediction. In: Proc. of the 2018 IEEE Int’l Conf. on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom). Melbourne: IEEE, 2018. 509–516.
    [17] Quan XJ, Kit C, Ge Y, Pan SJ. Short and sparse text topic modeling via self-aggregation. In: Proc. of the 24th Int’l Conf. on Artificial Intelligence. Buenos: AAAI Press, 2015. 2270–2276.
    [18] Li XM, Zhang JJ, Ouyang JH. Dirichlet multinomial mixture with variational manifold regularization: Topic modeling over short texts. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 7884–7891. [doi: 10.1609/aaai.v33i01.33017884]
    [19] Xu JM, Xu B, Wang P, Zheng SC, Tian GH, Zhao J, Xu B. Self-Taught convolutional neural networks for short text clustering. Neural Networks, 2017, 88: 22–31. [doi: 10.1016/j.neunet.2016.12.008]
    [20] Xu JM, Wang P, Tian GH, Xu B, Zhao J, Wang FY, Hao HW. Short text clustering via convolutional neural networks. In: Proc. of the 1st Workshop on Vector Space Modeling for Natural Language Processing. Denver: The Association for Computational Linguistics, 2015. 62–69.
    [21] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013.
    [22] Tian G, Zhao ST, Wang J, Zhao ZQ, Liu JJ, Guo LT. Semantic sparse service discovery using word embedding and Gaussian LDA. IEEE Access, 2019, 7: 88231–88242. [doi: 10.1109/ACCESS.2019.2926559]
    [23] Zuo Y, Wu JJ, Zhang H, Lin H, Wang F, Xu K, Xiong H. Topic modeling of short texts: A pseudo-document view. In: Proc. of the 22nd ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM, 2016. 2105–2114.
    [24] Xun GX, Li YL, Gao J, Zhang AD. Collaboratively improving topic discovery and word embeddings by coordinating global and local contexts. In: Proc. of the 23rd ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Halifax: ACM, 2017. 535–543.
    [25] Suh S, Choo J, Lee J, Reddy CK. Local topic discovery via boosted ensemble of nonnegative matrix factorization. In: Proc. of the 26th Int’l Joint Conf. on Artificial Intelligence. Melbourne: AAAI Press, 2017. 4944–4948.
    [26] Shi T, Kang K, Choo J, Reddy CK. Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations. In: Proc. of the 2018 World Wide Web Conf. Lyon: ACM, 2018. 1105–1114.
    [27] Chen Y, Zhang H, Liu R, Ye ZW, Lin JY. Experimental explorations on short text topic mining between LDA and NMF based Schemes. Knowledge-Based Systems, 2019, 163: 1–13. [doi: 10.1016/j.knosys.2018.08.011]
    [28] Yin JH, Wang JY. A model-based approach for text clustering with outlier detection. In: Proc. of the 32nd IEEE Int’l Conf. on Data Engineering (ICDE). Helsinki: IEEE, 2016. 625–636.
    [29] Guo JJ, Gong ZG. A nonparametric model for event discovery in the geospatial-temporal space. In: Proc. of the 25th ACM Int’l on Conf. on Information and Knowledge Management. Indianapolis: ACM, 2016. 499–508.
    [30] Du N, Farajtabar M, Ahmed A, Smola AJ, Song L. Dirichlet-hawkes processes with applications to clustering continuous-time document streams. In: Proc. of the 21st ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Sydney: ACM, 2015. 219–228.
    [31] Chen T, Liu JX, Cao BQ, Peng ZL, Wen YP, Li R. Web service recommendation based on word embedding and topic model. In: Proc. of 2018 IEEE Int’l Conf. on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom). Melbourne: IEEE, 2018. 903–910.
    [32] Levy O, Goldberg Y. Neural word embedding as implicit matrix factorization. In: Proc. of the 27th Int’l Conf. on Neural Information Processing Systems. Cambridge: MIT Press, 2014. 2177–2185.
    [33] Xie JY, Ding LJ. The true self-adaptive spectral clustering algorithms. Acta Electronica Sinica, 2019, 47(5): 1000–1008 (in Chinese with English abstract). [doi: 10.3969/j.issn.0372-2112.2019.05.004] 谢娟英, 丁丽娟. 完全自适应的谱聚类算法. 电子学报, 2019, 47(5): 1000–1008. [doi: 10.3969/j.issn.0372-2112.2019.05.004]
    [34] Cai XY, Dai GZ, Yang LB, Zhang GQ. A self-adaptive spectral clustering algorithm. In: Proc. of the 27th Chinese Control Conf. Kunming: IEEE, 2008. 551–553.
    [35] Li T, Ding C. The relationships among various nonnegative matrix factorization methods for clustering. In: Proc. of the 6th Int’l Conf. on Data Mining (ICDM’2006). Hong Kong: IEEE, 2006. 362–371.
    [36] Salah A, Ailem M, Nadif M. Word co-occurrence regularized non-negative matrix tri-factorization for text data co-clustering. In: Proc. of the 32nd AAAI Conf. on Artificial Intelligence and the 30th Innovative Applications of Artificial Intelligence Conf. and the 8th AAAI Symp. on Educational Advances in Artificial Intelligence. New Orleans: AAAI Press, 2018. 489.
    [37] Ailem M, Salah A, Nadif M. Non-negative matrix factorization meets word embedding. In: Proc. of the 40th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. Shinjuku: ACM, 2017. 1081–1084.
    [38] Heyer LJ, Kruglyak S, Yooseph S. Exploring expression data: Identification and analysis of coexpressed genes. Genome Research, 1999, 9(11): 1106–1115. [doi: 10.1101/gr.9.11.1106]
    [39] van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9: 2579–2605.
    [40] Cai D, He X, Han J. Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(12): 1624–1637. [doi: 10.1109/TKDE.2005.198]
    [41] Chang EC, Huang SC, Wu HH. Using K-means method and spectral clustering technique in an outfitter’s value analysis. Quality & Quantity, 2010, 44(4): 807–815. [doi: 10.1007/s11135-009-9240-0]
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

陆佳炜,赵伟,张元鸣,梁倩卉,肖刚.基于TWE-NMF主题模型的Mashup服务聚类方法.软件学报,2023,34(6):2727-2748

Copy
Share
Article Metrics
  • Abstract:620
  • PDF: 2381
  • HTML: 1499
  • Cited by: 0
History
  • Received:November 02,2020
  • Revised:January 29,2021
  • Online: December 08,2022
  • Published: June 06,2023
You are the first2044102Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063